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FOREWORD 

The  basic  purpose  of  this  document  is  to  provide  guidelines  for  SSPO  personnel  and  con¬ 
tractors  in  the  evaluation  and  documentation  of  SWS  program  life  cycle  reliability  and  avail¬ 
ability.  Each  section  of  the  document  achieves  this  purpose  by  providing  a  detailed  discussion 
of  the  various  reliability  and  availability  topics  and  techniques  necessary  for  an  evaluation 
program.  Section  5  (Assessment  of  Component  Reliability),  6  (Software  Evaluation)  and  7 
(Assessing  System  Reliability  and  Availability)  while  providing  detailed  discussions  and  nu¬ 
merous  examples,  deal  with  preliminary,  incomplete  and  rapidly  growing  topics.  These  sections 
may  also  be  controversial  due  to  the  discussions  of  Bayesian  Methods  and  Duane  or  software 
models  which  have  not  been  sufficiently  tested  on  all  classes  of  data  to  have  gained  general 
acceptance.  However,  these  sections  provide  guidelines  for  quantitative  data  analysis  and  an 
orientation  for  reliability  and  availability  analysts  and  other  interested  program  personnel.  If 
the  analyst  is  reasonably  sure  that  the  distribution  being  evaluated  is  not  exponential,  then 
sections  5  and  7  contain  references  to  other  material  which  would  be  needed  to  handle  state- 
of-the-art  problems. 

Originally  published  in  1965,  NAVWEPS  OD  29304[  1 J  prescribed  uniform  procedures  for 
measuring  the  reliability  of  subsystems,  beginning  in  research  and  development.  The  manual’s 
purpose  was  to  provide  a  practical  method  for  making  reliability  measurements,  measurements 
directly  related  to  mission  requirements  and  useful  in  both  development  and  operational  pro¬ 
gram  phases.  Following  its  introduction  the  manual  gained  acceptance,  particularly  after  in¬ 
dependent  studies  confirmed  the  accuracy  and  validity  of  its  techniques,  even  in  conditions 
where  the  quality  of  input  data  was  severely  limited.  The  manual’s  techniques  represented  the 
“state  of  the  art”  in  reliability  analysis  at  that  time.  The  methodology  was  based  on  “classical” 
methods,  as  any  subjective  or  pre-existing  knowledge  of  hardware  characteristics  was  excluded 
from  consideration  and  analysis  was  based  solely  on  test  results. 

The  manual  was  first  revised  in  1973(3].  Bayesian  statistical  methods,  which  permitted 
formal  inclusion  of  prior  information  directly  with  test  data  for  economical  evaluation  pro¬ 
grams,  were  included  in  this  revision.  In  many  instances  reliability  is  more  efficiently  defined 
in  terms  of  performance  variables,  rather  than  in  terms  of  absolute  success  or  failure.  Here  the 
methods  of  variables  statistics  permit  more  efficient  usage  of  test  results,  with  consequent  cost 
reduction  to  the  program  budget.  Variables  methods  compatible  with  the  Basic  and  the 
Bayesian  Method  of  the  OD  were  also  incorporated  in  1973.  In  addition  to  including  these 
analytical  methods,  new  material  was  added  in  1973  covering  reliability  prediction  and  ap¬ 
portionment. 

The  treatment  of  data  system  requirements  was  expanded  in  1973  to  accommodate  evalua¬ 
tion  of  modified  and  operational  systems  in  development.  Tests  specifically  performed  for 
reliability  demonstration  were  discussed  and  the  conditions  under  which  they  may  be  desirable 
were  considered.  An  effort  was  made  to  amplify  the  presentation  throughout,  to  incorporate 
aspects  of  operational  readiness  evaluation  contained  in  NAVORD  OD  43251(4],  and  to 
maintain  compatibility  with  NAVORD  OD  42282(5] ,  Integrated  Test  Program  Manual. 

This  latest  revision  has  been  prepared  to  update  and  expand  both  the  methods  and  scope  of 
NAVORD  OD  29304AI3],  and  to  address  the  reliability  and  availability  evaluation  require¬ 
ments  of  NAVSEA  OD  21549(6] .  This  section  on  methods  (Section  5)  for  assessing  reliability 
of  system  elements  has  been  expanded  to  cover  a  wider  variety  of  statistical  models  and  to 
incorporate  reliability  growth  models.  Section  6  has  been  included  to  cover  reliability  evalua¬ 
tion  of  software.  The  section  on  system  assessment  (Section  7)  has  been  expanded  to  provide 
procedures  for  combining  the  expanded  hardware  models  and  the  software  models  to  yield 
system  mission-phase  reliability  and  availability  values.  Section  8  has  been  added  to  cover 
reliability  demonstration.  The  material  in  all  other  sections  has  been  updated  in  order  to  be 
compatible  with  the  additions  described  above.  Fault  tree  analysis  has  been  included  in  the 
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section  on  reliability  analysis  as  a  procedure  for  evaluating  designs  considering  undesirable 
events  in  an  operational  environment. 

This  document  supersedes  NAVORD  OD  29304A[3]  and  NAVORD  OD  43251(4).  The 
Statistical  Addendum  (2) ,  NAVORD  OD  29304/ Addendum,  has  not  been  superseded. 
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GLOSSARY 


ACCURACY 

ALERT  TIME 

ALGORITHM 

ALLOCATION 

APPORTIONMENT 

ASSESSMENT 

ATTRIBUTE 

AVAILABILITY,  A 
AVAILABILITY,  apparent 

AVAILABILITY,  INHERENT,  A, 

AVAILABILITY,  INTERVAL,  AT 


A  term  denoting  the  closeness  of  measured  values 
to  the  true  value  of  a  quantity,  taking  into  consid¬ 
eration  both  systematic  error  (bias)  and  random 
error  (variance). 

Time  during  which  a  system  is  on  station  ready 
for  operation. 

A  fixed  step-by-step  procedure  to  carry  out  a  given 
computational  or  logical  formulation. 

(See  APPORTIONMENT) 

A  process  of  assigning  goals  or  requirements  to 
items  in  a  system  in  accordance  with  a  logical 
scheme. 

The  use  of  test  data  and/or  operational  service  data 
to  form  estimates  of  population  parameters  and  to 
evaluate  the  precision  of  those  estimates  (syn¬ 
onym  -  Estimation). 

A  characteristic  or  property  that  is  appraised  in 
terms  of  whether  it  does  (or  does  not)  exist  with 
respect  to  a  given  requirement. 

Availability  is  equivalent  to  Operational  Readiness 
Reliability  in  this  manual 

For  an  item  checked  out  at  intervals,  the  quotient 
of  apparent  up  time  divided  by  apparent  up  time 
plus  apparent  down  time.  Apparent  availability  is 
greater  than  operational  availability  when  failure 
detection  is  not  immediate. 

Availability  with  respect  to  failure  only ,  under 
ideal  support  conditions;  an  intrinsic  hardware 
characteristic.  It  is  estimated  by  the  ratio  of  total 
operating  time  plus  total  alert  time  to  the  sum  of 
total  operating  time  plus  total  alert  time  plus  total 
corrective  maintenance  time. 

The  time  average  of  pointwise  availability  over 
intervals  of  stated  length,  T. 


AVAILABILITY,  OPERATIONAL,  An  Availability  in  the  actual  operating  environment;  a 

function  of  facility  characteristics  as  well  as  hard¬ 
ware.  It  is  estimated  as  the  ratio  of  operating  time 
plus  alert  time  to  total  calendar  time.  Equivalent 
to  operational  readiness. 
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AVAILABILITY  PHASE  Any  phase  of  a  mission  when  an  availability  figure- 

of-merit  applies,  i.e. ,  failures  are  permissible  if  sys¬ 
tem  is  up  when  needed. 

AVAILABILITY,  POINTWISE,  A(t)  The  probability  that  an  item  will  be  operable  at  a 

stated  instant  in  time. 

AVAILABILITY,  STEADY  STATE,  A  The  limit  of  interval  availability  as  time  increases 

without  limit  (T  -»  °°).  It  is  estimated  by  the  up 
time  ratio  or  the  expression  MTBF  divided  by 
MTBF  plus  MTTR. 


BAYES’  EQUATION 


ftAYESIAN  METHODS 


BIAS 


BINOMIAL  DISTRIBUTION 


P(A|B)  =  P(A) 


P(B|A) 

P(B) 


where  the  various  terms  may  be  defined  and  illus¬ 
trated  as  follows: 

A  =  A  hypothesis  or  statement  of  belief ,  for  exam¬ 
ple  the  failure  rate  is  .007  failures  per  million 
hours. 

B  =  Evidence  such  as  a  test  result,  bearing  upon  the 
truth  or  credibility  of  the  hypothesis;  for  example 
the  component  experienced  no  failures  in  1,000 
hours  of  simulated  mission  testing. 

P(A)  =  The  prior  probability  or  degree  of  belief  in 
the  truth  of  hypothesis  A  before  test  information 
B  becomes  available. 

P(A|B)  =  The  posterior  or  updated  measure  of 
belief  in  hypothesis  A  given  the  impact  of  evidence 
B. 

P(B1A)  =  The  likelihood  or  probability  of  the 
evidence  given  the  truth  of  the  hypothesis. 

P(B)  =  The  probability  of  the  evidence  B  evaluated 
over  the  ensemble  of  possible  hypotheses  A. 


Statistical  procedures  that  allow  information  avail¬ 
able  prior  to  testing  to  be  combined  with  test  data 
by  means  of  Bayes’  equation. 


A  measure  of  error  that  occurs  systematically  in  a 
series  of  measurements.  The  difference  between  a 
true  value  and  the  limiting  means  of  repeated  mea¬ 
surements  is  termed  bias. 


Consider  a  series  of  n  independent  events,  each 
event  having  only  two  possible  states  with  prob¬ 
abilities  of  occurrence  P  and  l-P.  If  n  is  fixed,  a 
random  variable  X  is  said  to  have  a  Binomial 
Distribution  with  parameter  P  when 

PIX-x}  -(5J)P‘  (UP)  x  =  0,  1.2 . n 
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COMPONENT 


CONFIDENCE  COEFFICIENT 


CONFIDENCE  INTERVAL 


CONFIDENCE  LIMIT 

CONSTANT  FAILURE  RATE 
(CFR) 

CUMULATIVE  DISTRIBUTION 
FUNCTION  (CDF) 

CUT 

DECREASING  FAILURE  RATE 
(DFR) 

DEGREE  OF  BELIEF 

DEMONSTRATION 

DENSITY  FUNCTION 
DESIGN  OBJECTIVE 

DESIGN  REQUIREMENT 


The  first  indenture  level  below  an  equipment.  A 
combination  of  parts,  devices  and  structure, 
usually  self  contained,  which  performs  a  distinct 
function  (acts  on  one  or  more  inputs  to  produce 
appropriate  outputs)  in  the  operation  of  an  equip¬ 
ment;  for  example,  a  converter,  a  gas  generator, 
an  amplifier. 

A  measure  of  assurance  that  a  statement  based 
upon  statistical  (frequency)  data  is  correct.  The 
probability  that  an  unknown  parameter  lies  within 
a  stated  interval  or  is  greater  than  or  less  than  some 
stated  value. 

A  region  within  which  an  unknown  parameter  is 
said  to  lie  with  stated  probability.  The  region  is 
two  sided  when  both  upper  and  lower  limits  are 
specified.  It  is  one  sided  when  only  the  upper  or 
the  lower  limit  is  specified. 

A  bound  of  a  confidence  interval. 

Characterizes  an  item  with  constant  (Instanta¬ 
neous)  Hazard  Rate  h(t). 

The  probability,  F(x)  that  a  random  variable  x 
takes  a  value  less  than  or  equal  to  x. 

A  set  of  items  in  a  higher  level  assembly  which,  if 
all  are  failed,  indicate  the  higher  level  assembly  has 
also  failed. 

Characterizes  an  item  with  decreasing  hazard  rate. 
This  may  occur,  for  instance,  during  the  early  part 
of  the  life  of  an  item. 

A  Bayesian  term  associated  with  the  probability 
of  a  hypothesis. 

Formal  measurement  of  system  characteristics  with 
statistical  confidence  by  testing  or  operation. 

See  Probability  Density  Function 

A  desired  value  or  goal  relative  to  a  stated  device 
parameter  or  characteristic,  established  as  guidance 
for  designers. 

A  required  value  relative  to  a  stated  device  param¬ 
eter  or  characteristic.  In  the  case  of  RMA  design 
requirements,  normal  practice  is  to  specify  a  design 
requirement  along  with  an  appropriate  producer's 
risk  and  a  minimum  acceptable  value  along  with  an 
appropriate  consumer's  risk. 
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EQUIPMENT 

ESTIMATION 

EVALUATION 

EXPECTED  VALUE 

EXPONENTIAL  DISTRIBUTION 

EXPONENTIAL  MODEL 

FAILURE 

FAILURE,  NON-RELEV  ANT 
FAILURE,  RELEVANT 

FAILURE  RATE 
FBM  WEAPON  SYSTEM 

FIGURE  OF  MERIT 

FIRMWARE 

HARDWARE 


The  first  indenture  level  below  a  subsystem. 

The  use  of  test  and/or  operational  data  to  form 
estimates  of  population  parameters  and  to  evaluate 
the  precision  of  those  estimates  (synonyms  - 
Assessment  and  Measurement). 

A  broad  term  used  to  encompass  prediction,  mea¬ 
surement,  and  demonstration. 

The  first  moment  about  the  origin  of  a  probability 
distribution.  Also  called  mean.  May  be  estimated 
by  the  arithmetic  average. 

A  probability  distribution  having  the  density  func¬ 
tion  f(t)  =  Xex‘  where  X,  the  failure  rate,  is  a  con¬ 
stant.  Under  very  general  conditions  it  is  the  dis¬ 
tribution  of  time  between  successive  failures  of 
complex  systems. 

In  reliability  engineering,  a  model  based  on  the 
assumption  that  times  t  between  successive  failures 
are  described  by  the  exponential  distribution. 

Performance  below  a  specified  minimum  level  or 
outside  a  specified  tolerance  interval. 

Failure  not  applicable  to  the  computation  of 
reliability. 

Failure  attributable  to  a  deficiency  of  design,  man¬ 
ufacture  or  materials  of  the  failed  device;  appli¬ 
cable  to  the  computation  of  reliability. 

When  not  further  qualified,  denotes  hazard  rate. 

The  SSBN  submarine,  together  with  its  supporting 
tactical  subsystems  -  missile,  fire  control,  guid¬ 
ance,  MTRE,  navigation,  launcher,  and  ship  sup¬ 
port. 

An  index  or  quantitative  measure  of  merit  used  to 
characterize  an  item  for  analysis  or  comparison. 

Denotes  a  logical  element  performing  like  software 
but  built  as  hardware  which  is  part  of  a  system  or 
computer. 

A  general  term  denoting  physical  elements  of  a 
system. 
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HAZARD  RATE,  h(t) 


HYPOTHESIS 

HYPOTHESIS  TESTING 


INCREASING  FAILURE  RATE 
(IFR) 

INDENTURE  LEVELS 


INTERVAL  ESTIMATE 


ITEM 

LOG  NORMAL  DISTRIBUTION 

MAINTAINABILITY 


MAINTENANCE 


Also  called  the  conditional  failure  density,  repre¬ 
sents  the  probability  that  an  item  still  functioning 
at  time  t  will  fail  in  the  interval  (t,  t  +  Aii,  where 
At  is  an  infinitesimal  time  increment.  Hazard  rate 
is  synonymous  to  conditional  failure  rate  or  in¬ 
stantaneous  failure  rate. 

In  Statistics,  an  assertion  about  the  value  or  dis¬ 
tribution  of  one  or  more  random  variables. 

The  use  of  test  data  to  assess  the  weight  of  evi¬ 
dence  for  or  against  a  hypothesis  and  to  accept  or 
reject  the  hypothesis  based  on  statistical  decision 
rules. 

Increasing  hazard  rate  generally  characteristic  of 
limited  life  items. 

A  hierarchical  structure  of  hardware  complexity: 

System 

Subsystem 

Equipment 

Component 

Module 

Part  or  Component  Part 

An  interval  asserted  to  enclose  a  defined  set  with 
stated  probability.  Examples:  a  confidence  interval 
to  include  a  parameter,  a  prediction  interval  to  in¬ 
clude  a  set  of  observations,  a  tolerance  interval  to 
include  a  population  fraction. 

A  general  term  denoting  physical  elements  of  a 
system. 

Statistical  distribution  which  characterizes  times  to 
failure  of  items  displaying  normally  distributed 
logarithms  of  times  to  failure. 

A  measure  of  the  ability  of  an  item  to  be  main¬ 
tained.  Mean  preventive  maintenance  time,  mean 
repair  time,  and  mean  down  time  are  commonly 
used  indices  of  maintainability.  (The  often- 
encountered  definition  of  maintainability  as  the 
probability  of  repair  within  a  stated  time  is  not 
used  because  that  probability  is  not  used  in  the 
availability  expressions  and  computations  in  this 
manual.) 

All  actions  necessary  for  retaining  an  item  in.  or 
restoring  »t  to,  a  specified  condition. 
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MAINTENANCE  CONSTRAINTS  Limitations  on  the  quantity  and/or  quality  of 

maintenance  available  to  a  system  in  use. 

MAINTENANCE,  CORRECTIVE  Unscheduled  actions  performed,  as  a  result  of  an 

item  failure,  to  repair  an  item  and  restore  it  to  a 
specified  condition. 

MAINTENANCE,  PREVENTIVE  Actions  performed  on  a  scheduled  or  routine  basis 

in  an  attempt  to  retain  an  item  in  a  specified  con¬ 
dition  by  providing  systematic  inspection,  detec¬ 
tion  and  prevention  of  incipient  failure. 

MAXIMUM  LIKELIHOOD  METHOD  A  method  which  consists  in  expressing  the  likeli¬ 
hood  function  of  a  set  of  failure  times,  or  other 
data,  and  to  determine,  from  the  maximum  value 
of  the  likelihood,  a  “best"  estimate  of  one  or 
several  parameters  of  the  statistical  distribution 
which  characterizes  the  data. 

MEAN,  n  The  first  moment  of  a  probability  distribution 

about  its  origin;  the  expected  value  of  a  random 
variable.  The  mean  is  the  most  commonly  used 
measure  of  central  tendency.  Estimated  by  an 
arithmetic  average. 

MEAN  DOWN  TIME  The  expected  or  average  down  time. 

MEAN-TIME-BETWEEN-FAILURES  The  same  as  Mean-Time-To-Failure  for  non-repair- 

(MTBF)  able  items.  It  is  often  employed  specifically  for 

repairable  items  when  it  denotes  the  mean  time 
(or  cycles,  miles,  events)  between  successive  fail¬ 
ures.  MTBF  is  often  estimated  by  dividing  the 
total  operating  time  for  like  items  by  the  total 
number  of  failures  encountered. 

In  a  component  or  a  system,  the  mean  time  to 
first  failure.  If  fit)  is  the  Probability  Density  Func¬ 
tion  of  the  component,  or  system,  then  MTTF  = 
/”  t[fit)]dt  =  /”  R(t)dt.  Similar  formulas  hold  if 
measure  of  life  units  are  cycles,  miles,  events,  etc. 
rather  than  time.  For  repairable  exponential 
elements,  MTTF  denotes  also  the  mean  time  to 
successive  failures. 

The  mathematical  expectation  of  time  to  repair 
an  item.  Often  estimated  as  the  total  repair  time 
divided  by  the  total  number  of  repair  actions 
during  a  given  time  period. 

MEASUREMENT  Evaluation  of  the  characteristics  of  an  item  by 

observation  of  its  performance  in  test  or  opera¬ 
tional  service. 


MEAN-TIME-TO-REPAIR 

(MTTR) 


MEAN  TIME-TO-FAl LURE 
(MTTF) 
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MINIMAL  CUT 

MODULE 


NORMAL  DISTRIBUTION 


NORMAL  VARIABLE 


OPERATING  MODE 


OPERATIONAL  READINESS 

OPERATIONAL  READINESS 
RELIABILITY  (ORR) 

PARAMETER 


PERCENTILE 

POINT  ESTIMATE 


A  cut  from  which  no  item  can  be  removed  while 
maintaining  a  cut. 

The  first  indenture  level  below  a  component.  An 
onboard-replaceable  item;  for  example,  a  Type  3 
Module. 

The  most  prominent  continuous  distribution  in 
statistics,  frequently  referred  to  as  the  Gaussian  or 
bell-shaped  distribution.  Its  density  function  is 

1  j  , 

fix)  =  - e-(x-d)2/2o  -  OO  <  x  <  oo 

a  y/2ir 

with  mean  n  and  variance  o2 .  The  theoretical  jus¬ 
tification  for  the  normal  distribution  lies  in  the 
central-limit  theorem,  which  shows  that  under  very 
broad  conditions  the  distribution  of  the  average  of 
n  independent  observations  from  any  distribution 
approaches  a  normal  distribution  as  n  becomes 
large. 

A  random  variable  that  is  normally  distributed.  In 
situations  where  a  random  variable  represents  the 
total  effect  of  many  “small”  independent  causes, 
each  with  mutually  independent  errors,  the  central 
limit  theorem  leads  to  the  prospect  that  the  var¬ 
iable  will  be  normally  distributed. 

A  specific  pattern  of  system  operation  in  which  a 
designated  subset  of  the  system’s  functions  are 
realizable  (e.g.,  standby  mode,  tracking  mode, 
search  mode). 

Operational  availability  of  a  weapon  system  in 
fleet  service. 

The  probability  that  at  any  point  in  time  the  sys¬ 
tem  is  either  operating  satisfactorily  or  ready  to  be 
placed  in  operation  on  demand  when  used  under 
stated  conditions. 

A  performance  characteristic  (e.g.,  voltage,  pres¬ 
sure,  time  to  peak  pressure,  velocity,  etc.)  that  may 
be  measured  on  a  continuum  (variable)  or  go  -  no 
go  (attribute)  basis. 

If  p  percent  of  the  values  taken  by  a  variable  X  are 
less  than  or  equal  to  x,  then  x  is  defined  as  the  pth 
percentile  of  X. 

A  single-valued  estimate  of  a  population  parameter. 
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POISSON  DISTRIBUTION 


POISSON  EXPECTATION 

“POSTERIOR”  DISTRIBUTION 

PRECISION 

PREDICTION 

“PRIOR”  DISTRIBUTION 

PROBABILITY  DENSITY 
FUNCTION  (PDF) 


PROBABILITY  MASS  FUNCTION 

PROGRAMS 

QUALITY  ASSURANCE 


A  probability  distribution  applicable  to  situations 
where  a  large  number  of  observations  is  involved 
and  the  probability  of  an  event  occurring  in  any 
specific  observation  is  very  small.  It  has  the  density 
equation 


f(x)  =  [e-*  X*]/x! 


The  mean  n  of  the  Poisson  distribution.  In  reliabil¬ 
ity  problems  the  product  Xt  is  often  a  Poisson  ex¬ 
pectation. 

In  Bayesian  terminology,  a  Probability  Density 
Function  g(0fp)  which  modified  a  prior  distribu¬ 
tion  g(0)  through  the  use  of  Bayes’  theorem  when 
test  data,  such  as  times  to  failure  or  to  repair,  be¬ 
came  available. 

A  measure  of  the  scatter  of  repeated  measurements 
about  their  mean. 

Judgement  of  the  characteristics  of  an  item  by 
means  of  engineering  analysis,  using  generic  data 
and/or  historical  data  obtained  from  antecedent 
items. 

In  Bayesian  terminology,  a  Probability  Density 
Function  g(©)  used  as  some  prior  probabilistic 
degree  of  belief  on  one  (0)  or  several  parameters 
of  a  function  or  of  any  failure  or  repair  model. 

In  the  univariate  case,  it  is  a  continuous  function 
f(x)  of  a  random  variable  x  such  that  its  integral 
f(x)  dx  represents  the  probability  of  x  assuming 
a  value  between  a  and  b.  The  integral  over  all  x  is 
1.  In  the  multivariate  case  f(x)  becomes  fix*1’. 
x<2*,  ....  x<n>).  The  n-fold  integral  over  all  x(i>’s  is 
1. 

It  is  the  discrete  analogue  f(x,>  of  a  statistical 
probability  density  function  f(x). 

Programs,  specifically  digital  programs,  are  self- 
contained  sets  of  instructions  capable  of  perform¬ 
ing  a  specified  function  in  the  absence  of  other 
programs. 

A  planned  and  systematic  pattern  of  all  actions 
necessary  to  provide  adequate  confidence  that 
material  conforms  to  established  technical  require¬ 
ments  and  achieves  satisfactory  performance  in 
service. 
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RANDOM  VARIABLE 

REDUNDANCY 

REDUNDANCY,  ACTIVE 
REDUNDANCY,  STANDBY 

REJECT  RATE 

RELIABILITY,  R 

RELIABILITY  PHASE 

REPLACEMENT  RATE 

RISK,  CONSUMER'S,  P 

RISK,  PRODUCER’S,  a 

SOFTWARE 


SOFTWARE  ERROR 


An  output  of  an  experiment  which  may  take  any 
of  the  values  of  a  specified  set  with  a  specified 
relative  frequency  or  probability. 


The  existence  of  more  than  one  means  for  accom¬ 
plishing  a  given  function.  All  means  of  accomplish¬ 
ing  the  function  need  not  necessarily  be  identical. 

The  redundancy  wherein  all  redundant  items  are 
operating  simultaneously. 

That  redundancy  wherein  the  alternative  means  of 
performing  the  function  is  inoperative  until  needed 
and  is  switched  on  upon  failure  of  the  primary 
means  of  performing  the  function. 

The  percent  of  units  found  defective  by  testing  and 
then  rejected. 

The  probability  that  an  item  will  perform  its  in¬ 
tended  function  without  failure  for  a  specified  in¬ 
terval  undeT  stated  conditions,  given  that  it  is  up 
(operable)  at  the  beginning  of  the  inte«~val. 

Any  phase  of  a  mission  during  which  a  reliability 
figure  of  merit  applies  (i.e.,  no  failure  is  permis¬ 
sible).  e.g.,  launch,  flight. 

The  rate  at  which  spares  are  consumed.  This  is 
usually  greater  than  the  failure  rate  due  to  replace¬ 
ment  of  non-failed  hardware.  The  reciprocal  is 
mean-time-between-replacements. 

The  probability  that  a  test  will  accept  by  chance  a 
device  or  lot  having  a  characteristic  equal  to  a  spec¬ 
ified  unacceptable  level. 

The  probability  that  a  test  will  reject  by  chance  a 
device  or  lot  having  a  characteristic  equal  to  a 
specified  desired  level. 

Idealized  set  of  instructions  which  constitute  the 
essence  of  computer  programs,  subprograms,  and 
routines.  Also  contents  of  operating  and  mainte¬ 
nance  manuals  which  show  how  to  run  or  modify 
programs  or  equipment.  (Software  is  to  be  dis¬ 
tinguished  from  the  hardware  which  supports  the 
set  of  instructions  (e.g.,  punched  cards,  magnetic 
tape,  core  memory,  paper  of  manuals). 

An  incorrect  statement  or  logical  fault  residing  in 
the  coded  instructions  of  the  software. 
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SOFTWARE  FAILURE 


Software  error  revealed  during  execution  of  soft¬ 
ware. 


SOFTWARE  RELIABILITY 

SPECIFICATION 

STATE  A 

STATE  B 

STATE  C 

STATE  D 

SUBSYSTEM 

SUCCESS  CRITERIA 

SYSTEM  EFFECTIVENESS 

SYSTEM  STATE 

TACTICAL 

TIME 


The  probability  that  a  given  software  program  will 
operate  without  error  for  a  specified  time  in  a 
specified  mission. 

With  respect  to  RMA  requirements,  a  complete 
specification  provides  a  design  requirement  and  the 
associated  producer's  risk  and  a  test  requirement 
(minimum  acceptable  requirement)  and  the  asso¬ 
ciated  consumer’s  risk. 

A  hardware/software  state  in  which  the  item  is 
non-operating  but  must  be  operational  in  a  later 
mission  phase. 

A  hardware/software  state  in  which  the  item  is 
non-operating  and  must  not  operate  prematurely. 
It  must  survive  and  be  operational  in  a  later  mis¬ 
sion  phase. 

A  hardware/software  state  in  which  the  item  is 
operating.  The  duration  of  operation  is  measured 
in  cycles  or  discrete  events. 

A  hardware/software  state  in  which  the  item  is 
operating.  The  duration  of  operation  is  measured 
in  units  of  time  (e.g.,  hours  and  minutes). 

The  first  indenture  level  below  the  system,  SWS. 
Examples;  fire  control,  missile  and  navigation,  etc. 

The  minimum  functional  performance  required  of 
an  item  for  mission  success. 

A  measure  of  the  degree  to  which  an  item  can  be 
expected  to  achieve  a  set  of  specific  mission  re¬ 
quirements  and  which  may  be  expressed  in  terms 
of  availability,  reliability  and  performance  capabil¬ 
ity. 

A  designation  of  system  status  at  a  particular  time 
with  respect  to  operable  and  inoperable  equip¬ 
ments.  An  n-equipment  system  can  exist  in  2n 
states  ranging  from  all  equipments  up  to  all  equip¬ 
ments  down. 

Pertaining  to  or  necessary  for  the  primary  mission 
of  the  weapon  system. 

When  used  herein  without  a  modifier  the  word 
time  is  interpreted  to  mean  calendar  time. 
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TIME  (Continued)  Active  Time  —  That  time  during  which  an  item  is 

in  the  operational  inventory.  For  the  FBMWS/SWS. 
the  patrol  period.  It  is  the  time  base  for  avail¬ 
ability  calculations  in  this  book. 

Alert  Time  —  Time  during  which  a  system  is  ready 
for  operation.  For  FBMWS/SWS  on  station  await¬ 
ing  designated  targets. 

Checkout  Time  —  That  element  of  Repair  Time 
during  which  repair  is  being  confirmed  and  verified 
to  be  satisfactory. 

Corrective  Maintenance  Time  (Repair  Time)  - 
That  element  of  Downtime  in  which  work  is  done 
to  repair  trouble  or  failure.  Includes  time  to  obtain 
tools,  documents  and  spares  from  local  stock 
rooms,  set  them  up  for  repair,  troubleshoot,  test 
spares  if  necessary,  effect  repair,  make  necessary 
adjustments  and  calibrations,  confirm  the  repair  by 
test  if  necessary  and  close  up  the  repaired  item.  It 
specifically  excludes  time  devoted  to  off-line  repair 
of  any  item  that  was  replaced.  It  also  excludes 
elements  of  Delay  Time  such  as  meals,  sleep,  ad¬ 
ministrative  delays  including  the  postponement  of 
repair  by  managerial  decision,  awaiting  spares  from 
off-site  or  remote  locations,  etc. 

Downtime  -  Total  time  during  which  an  item  is 
not  in  condition  to  perform  its  intended  function. 

Fault  Correction  Time  -  That  element  of  Repair 
Time  during  which  a  failure  is  corrected  by  (a) 
repairing  in  place,  (b)  removing,  repairing,  and  re¬ 
placing,  or  (c)  removing  and  replacing  with  a  like 
serviceable  item. 

Fault  Location  Time  -  That  element  of  Repair 
Time  during  which  testing  and  analysis  is  per¬ 
formed  on  an  item  to  isolate  a  failure. 

Inactive  Time  -  That  time  during  which  an  item  is 
not  in  active  inventory,  therefore,  not  expected  to 
be  operable,  for  the  FBMWS/SWS,  time  not  spent 
on  patrol.  Not  included  in  availability  calculations 
in  this  manual. 

Item  Obtainment  Time  -  That  element  of  Repair 
Time  during  which  the  needed  item  or  items  are 
obtained  from  stockrooms  within  the  facility.  For 
the  FBMWS/SWS,  time  to  obtain  items  from  the 
ship’s  stores. 
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TIME  (Continued) 


UPTIME  RATIO 

VARIABLE 

VARIABLES  METHOD 

VARIANCE,  a2 


Maintenance  Time  -  That  part  of  Downtime  when 
maintenance  work  is  actually  being  done. 

Mission  Time  -  That  element  of  Uptime  when  an 
item  is  performing  its  designated  mission. 

Modification  Time  -  That  element  of  Downtime 
during  which  specific  modifications  or  retrofits  are 
made  to  an  item  to  add  to  or  improve  its  charac¬ 
teristics. 

Operating  Time  -  Cumulative  Operating  Time  in 
testing  or  use. 

Preparation  Time  -  That  element  of  Repair  Time 
needed  to  obtain  the  necessary  test  equipment  and 
maintenance  manuals  and  set  up  the  necessary 
equipment. 

Reaction  Time  -  That  element  of  Uptime  needed 
to  initiate  mission  functions,  measured  from  the 
time  a  command  is  received. 

Uptime  -  That  element  of  Active  Time  when  an 
item  is  up,  i.e.,  alert,  reacting  or  performing  mis¬ 
sion  functions. 

Uptime,  Apparent  -  That  element  of  Active  Time 
when  an  item  is  thought  to  be  up.  Apparent  Up¬ 
time  may  be  greater  than  Uptime  when  failure 
detection  is  not  immediate. 

The  quotient  of  Uptime  divided  by  Uptime  plus 
Downtime.  The  Uptime  ratio  is  a  statistical  esti¬ 
mate  of  steady-state  availability. 

A  characteristic  or  property  that  is  appraised  in 
terms  of  scalar  values. 

A  method  whereby  the  value  of  certain  measurable 
parameters  are  equated  to  “failures”  or  “errors”  if 
they  lie  beyond  the  range  of  specified  critical 
values.  The  values  of  the  parameters  are  often 
assumed  to  be  normally  distributed. 

The  second  moment  about  the  mean  of  a  probabil¬ 
ity  distribution.  A  measure  of  the  dispersion  of 
random  variable  about  its  mean  value.  In  testing, 
variance  is  a  measure  of  random  errors  in  a  series 
of  measurements. 
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VARIANCE  OF  ESTIMATE 


WE1BULL  DISTRIBUTION 


The  variance  associated  with  a  parameter  estimate 
obtained  by  sampling.  For  example,  =  o2  / n  is 
the  variance  of  estimate  of  the  mean  It  of  samples 
of  size  n  from  the  distribution  of  the  random 
variable  x  having  variance  a2.  The  square  root  of 
the  variance  of  estimate  is  called  the  Standard 
Error. 


The  distribution  of  the  smallest  value  of  life, 
strength  or  similar  property  among  a  sample  of 
components  is  modeled  by  the  pdf 


t>0,  a.  >0 
t<0 


The  function,  called  the  Weibull  model,  applies 
when  failure  is  determined  by  the  strength  of  a 
weakest  link.  The  function  was  introduced  by 
Weibull  on  empirical  grounds  based  on  studies  of 
material  strength.  It  was  later  derived  by  Freuden- 
thal  and  Gumbel  from  extreme  value  theory,  as 
the  type  111  asymptote  of  the  minimum  extreme 
value  among  measurements  modeled  by  an  initial 
distribution  bounded  below. 

The  location  parameter  a  is  minimum  expected 
life;  the  shape  parameter  (3  is  a  slope.  There  is  also 
a  three  parameter  Weibull  Distribution  which  in¬ 
volves  a  scale  parameter  representing  absolute 
minimum  life. 
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LIST  OF  SYMBOLS,  ACRONYMS 
AND  ABBREVIATIONS 

Symbols 

Coefficient  used  to  compute  confidence  interval. 

Steady-state  availability. 

Achieved  availability.  Availability  with  respect  to  failure  (corrective  main¬ 
tenance)  and  preventive  maintenance  jointly. 

Inherent  availability.  Availability  with  respect  to  failure  (corrective  main¬ 
tenance). 

Operational  availability.  Availability  attained  in  actual  use. 

Availability  with  respect  to  preventive  maintenance. 

Pointwise  availability. 

Interval  availability. 

A  cut  or  set  of  components  which  if  all  fail  will  fail  the  subsystem. 

Confidence  level;  in  basic  method  the  ratio  o2/X. 

Length  of  component  reliability  confidence  interval.  Also  observed  downtime 
in  an  interval. 

Largest  absolute  deviation,  also  downtime. 

Base  of  natural  logarithms. 

Degrees  of  freedom. 

Probability  density  function,  or  probability  mass  function. 

Fractile  of  the  cumulative  F -distribution. 

Event  that  at  least  one  cut  in  G  occurs. 

An  independent  group  of  cuts. 

Hazard  rate  function. 

Subscripts  denoting  respectively  component,  environment,  test  state. 

Factor  used  with  accelerated  testing. 
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Standard  normal  deviate. 

Tolerance  factor  for  reliability  R,  confidence  C,  sample  size  n,  degrees  of 
freedom  f. 

Subscript  denoting  lower  confidence  limit. 

Number  of  successful  units  in  m  of  n  configuration;  also  number  of  observed 
repairs. 

Mean  of  a  normally  distributed  random  variable;  also  equivalent  number  of 
missions. 

Corrective  maintenance  time;  repair  time. 

Mean  corrective  maintenance  time;  MTTR. 

Geometric  mean  corrective  maintenance  time. 

Sample  size  ;  also  number  of  variables  in  model. 

Normal  PDF. 

Number  of  tests.  Also  number  of  items  on  test. 

Normal  CDF. 

Subscript  denoting  parallel-related  components. 

Probability. 

Unreliability,  (1-R). 

Reliability. 

Subscript  denoting  serial-related  components;  also  sample  standard  deviation: 
also  number  of  successes  in  a  test. 

Safety. 

Actual  test  time;  also  the  student's  -t  statistic. 

Time  between  the  (i-l)th  and  the  ith  failure  when  n  items  are  placed  on  test 
without  replacement. 

Time  to  failure  of  the  ith  item  since  the  beginning  of  testing. 

Time  to  failure  of  the  last  failing  item. 

Planned  test  time,  also  a  period  of  fixed  or  nominal  length. 

Observed  uptime  in  an  interval;  also  subscript  denoting  upper  confidence  limit. 
Unavailability;  also  uptime. 
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Uj  Unavailability  with  respect  to  failure;  inherent  unavailability. 

Up  Unavailability  with  respect  to  preventive  maintenance. 

W  Statistic  used  in  W-test  of  normality ;  also  ratio  of  two  random  variables, 

x  Number  of  failures. 

X  Mean  of  a  sample. 

a  Producer’s  risk;  also  coefficient  used  to  convert  test  or  operating  time  (or 

cycles)  to  equivalent  missions. 

0  Consumer’s  risk;  also  a  bias  correction  factor  in  the  basic  method. 

7  '  Significance  level. 

y(  )  Gamma  density  function. 

T  Gamma  function. 

A(  )  Gamma  CDF. 

6  Delta  function. 

A  An  incremental  change  in  the  value  of  a  variable, 

e  Limiting  acceptable  risk ;  also  error. 

6  Mean  time  to  failure,  also  mean  time  between  failures. 

A  Failure  rate. 

p  Mean  of  a  normally  distributed  random  variable  ;  also  repair  rate 

r j  Warrantee  period. 

v  Reciprocal  of  MC(. . 

ir  3.14159 

n  Symbol  denoting  product. 

pt)  Simple  coefficient  of  correlation  between  i  th  and  j  th  variables. 

Multiple  correlation  coefficient  for  i  th  variable, 
o  Standard  deviation  of  a  normally  distributed  random  variable. 

o2  Variance  of  a  normally  distributed  random  variable. 
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Symbol  denoting  summation. 

In  the  Bayesian  method,  prior  or  pseudo  test  time. 

In  the  Bayesian  method,  prior  or  pseudo  failures. 

Chi-square  statistic. 

Indicates  statistical  point  estimate. 

Prime  symbol  denotes  predicted  value. 

Star  symbol  denotes  required  value,  or  objective. 

Bar  denotes  negation  (e.g.  A  denotes  not  A);  also  denotes  average  value. 
Union  symbol,  interpreted  as  the  logical  “or”. 

Intersection  symbol,  interpreted  as  the  logical  “and”. 

Is  distributed  as. 

Becomes. 

Paragraph. 
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Acronyms  and  Abbreviations 


AO 

Approximately  Optimum. 

BAN 

Best  Asymptotically  Normal. 

BICS 

Boolean  Indicated  Cut  Sets. 

BLU 

Best  Linear  Unbiased. 

CA 

Corrective  Action. 

CAR 

Corrective  Action  Report. 

CDF 

Cumulative  Distribution  Function. 

C’DR 

Critical  Design  Review. 

CDRI 

Contract  Data  Requirement  List. 

CFR 

Constant  (Instantaneous)  Failure  Rate. 

CLIFS 

Coordination,  Life,  Interchangeability,  Function  and  Safety. 

DFR 

Decreasing  (Instantaneous)  Failure  Rate. 

DSARC 

Defense  Systems  Acquisition  Review  Council. 

EDM 

Engineering  Development  Model. 

FBMWS 

Fleet  Ballistic  Missile  Weapon  System. 

FLTAC 

Fleet  Analysis  Center. 

FMECA 

Failure  Mode,  Effects,  and  Criticality  Analysis. 

FOT 

Follow-on  Operational  Test. 

FSR 

Failure  Summary  Report. 

FT  A 

Fault  Tree  Analysis. 

IFR 

Increasing  (Instantaneous)  Failure  Rate. 

ITP 

Integrated  Test  Program. 

ITPP 

Integrated  Test  Program  Plan. 

LCLS 

Lower  Critical  Limit  Specification. 

MEC 

Mission  Essentiality  Code. 
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MIL  HDBK 

Military  Handbook. 

MIL  STD 

Military  Standard. 

ML 

Maximum  Likelihood. 

MLE 

Maximum  Likelihood  Estimate. 

MTBF 

Mean  Time  Between  Failures. 

MTTF 

Mean  Time  to  Failure. 

MTTR 

Mean  Time  to  Repair. 

MVU 

Minimum  Variance  Unbiased. 

NAVMAT 

Naval  Material  Command. 

NAVORD 

Naval  Ordnance  Systems  Command. 

NAVSEA 

Naval  Sea  Systems  Command. 

NAVWEPS 

Bureau  of  Naval  Weapons. 

OD 

Ordnance  Document. 

OPEVAL 

Operational  Evaluation. 

OT 

Operational  Test. 

PAT 

Production  Assessment  Test. 

PDF 

Probability  Density  Function. 

PDR 

Preliminary  Design  Review. 

PEM 

Performance  Evaluation  Missile. 

PERT 

Program  Evaluation  Review  Technique. 

PLS 

Production  Lot  Sampling. 

PMF 

Probability  Mass  Function. 

POMP 

POSEIDON  Modification  Program. 

PPM 

Program  Plan  Matrix. 

PRST 

Probability  Ratio  Sequential  Test. 

PUAD 

Parts  Usage  and  Application  Data. 
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RDT&E 

RMA 

R&R 

RV 

SJTP 

SOR 

SOW 

SPALT 

SRA 

SSPO 

SWS 

TAAF 

TOP 

techeval 

TOG 

TRD 

UCLS 


Research,  Development,  Test  and  Evaluation. 

Reliability,  Maintainability  and  Availability. 

Repair  and  Refurbishment,  Repair  and  Replacement,  etc. 
Random  Variable. 

Shipyard  Installation  Test  Program. 

Specific  Operational  Requirement. 

Statement  of  Work. 

SSPO  Alteration. 

System  Requirements  Analysis. 

Strategic  Systems  Project  Office. 

Strategic  Weapon  System. 

Test  Analyze  and  Fix. 

Technical  Development  Plan. 

Technical  Evaluation. 

Technical  Objectives  and  Guidelines  Document. 
Technical  Requirements  Document. 

Upper  Critical  Limit  Specification. 
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Section  1 

INTRODUCTION 

r- 


1.1  PURPOSE 

The  Strategic  Systems  Project  Office 
(SSPO)  requires  each  prime  contractor  to 
satisfy  specified  reliability,  maintainability 
and  availability  (RMA)  operational  objectives 
and  RMA  evaluation  requirements.  These  re¬ 
quirements  are  primarily  specified  through 
N'AVSEA  OD  21549[1],  and  by  means  of 
Technical  Objectives  and  Guidelines  Docu¬ 
ments  (TOGs),  weapon  specifications,  and 
SSPO  policy  statements  cited  in  applicable 
contracts. 

The  primary  purpose  of  this  document  is 
to  provide  SSPO  contractors  with  a  set  of 
RMA  evaluation  methodology  and  techniques 
to  facilitate  compliance  with  contractual 
RMA  evaluation  requirements  and  to  foster 
the  achievement  of  specified  RMA  opera¬ 
tional  objectives. 

1.2  SCOPE 

This  document  is  concerned  primarily  with 
evaluating  the  reliability  and  availability  of 
strategic  weapon  systems  and  subsystems 
throughout  the  lite-cycle  of  the  program,  in¬ 
cluding  concept  development,  advanced  de¬ 
velopment.  full-scale  development,  produc¬ 
tion  and  deployment.  However,  not  all  SSPO.. 
contractors  build  subsystems,  so  le  supply 
equipments  or  other  lower  level  assemblies. 
The  methods  presented  in  this  manual  are 
applicable  to  devices  of  any  assembly  level 
Thus,  the  term  system,  as  used  herein,  should 
be  interpreted  as  the  highest  level  assembly 
provided  by  a  contractor.  Evaluation  as 
treated  herein  is  normally  quantitative,  how¬ 
ever,  in  some  cases  (e.g.,  the  logistic  phase 
of  a  mission),  qualitative  evaluation  is  per¬ 
formed  'Evaluation,  as  discussed  herein,  is 
m  tended  to  encompass  those  technical 


program  management  elements  described  in 
the  paragraphs  of  NAVSE  A  OD  21549(1] 
listed  in  Figure  1-1 . 

~>The  major  objectives  of  the  evaluation 
program  are: 

>  •  to  establish  reliability  design  criteria 

•  to  provide  periodic  assessments  of 
achieved  reliability 

•  to  identify  RMA;problem  areas 

•  to  provide  evidence  of  compliance 
with  contractual  requirements. 

It  is  recognized  that  many  functions  other 
than  evaluation  are  also  necessary  to  an 
effective  reliability  and  availability  program. 
Figure  1-2  lists  some  of  them. 

While  it  is  fully  recognized  that  many  of 
the  functions  listed  in  figure  1-2  are  major 
contributors  to  the  design  of  reliable  and 
maintainable  systems  and  to  the  detection 
and  correction  of  reliability  and  maintain¬ 
ability  problems,  detailed  discussion  of  those 
functions  is  beyond  the  scope  of  this  manual. 
They  are  discussed  herein  only  to  the  extent 
that  they  impact  the  evaluation  functions. 

1.3  APPLICABILITY 

The  evaluation  methods  described  in  this 
document  are  intended  for  use  by  SSPO 
contractors  and  subcontractors  to  the  extent 
specified  in  their  contracts,  throughout  the 
entire  contractual  period.  It  should  be  noted 
that  the  methods  used  in  this  manual  may  be 
used  on  equipment  procured  for  training  as 
well  as  on  tactical  equipment  NAV'Sl  A 
21549(1)  is  structured  for  use  in  a 
variet^saj^ contract  phases  (e  g  .  concept  de- 
velopment,''3idvanced  development,  full-scale 
development,  production  and  deployment). 
The  OD  also  recognizes  that  hardware  and 
software  are  procured  at  various  indenture 
levels  (e.g.,  subsystem,  equipment. 
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component).  Not  ail  requirements  are  uni¬ 
formly  applicable  in  all  contract  phases. 
NAVSEA  OD  21549(1]  provides  a  vehicle 
(Appendix  B)  to  permit  tailoring  RMA  pro¬ 
gram  requirements  to  contract  needs. 

Whi'e  standardization  is  desirable,  it  is 
recognized  that  new  methods  may  be  develop¬ 
ed  in  the  future,  which  offer  significant  ad¬ 
vantages  and  that  special  situations  may  arise 
in  a  program  for  which  the  methods  discussed 
herein  may  not  be  optimal.  Although  SSPO 
contractors  are  encouraged  to  use  the  meth¬ 
ods  described  herein,  they  are  also  encouraged 
to  develop  their  own  methods  for  special 
situa’.ons.  It  is  intended  that  this  manual  be 
reviewed  periodically  to  insure  its  accuracy 
and  currency.  Users  are  encouraged  to  report 
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Figure  1-1.  Evaluation  Functions  Included  in 
NAVSEA  OD  2  1549(1] 


errors  discovered  and  recommendations  for 
improvement  to: 

Department  of  the  Navy 
Strategic  Systems  Project  Office 
Washington,  D.C.  20376 
Attention  CodeSP2014 
While  primarily  intended  for  use  in  SSPO 
programs,  the  methods  presented  in  this 
manual  are  equally  applicable  for  evaluating 
the  reliability  and  availability  of  many  com¬ 
plex  military  or  industrial  products. 

1.4  REFERENCES 

1 .  NAVSEA  OD  21S49A:  Technical  Program 
Management  Requirements  for  Navy  SSPO 
Acquisitions. 
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Section  2 

RELIABILITY  AND  AVAILABILITY 


2.1  IMPORTANCE  OF  RELIABILITY  AND 
AVAILABILITY 

A  major  (ask  facing  a  system  manager  early 
in  a  development  program  is  the  establish¬ 
ment  of  system  requirements.  Among  the 
important  quantitative  system  requirements 
are  those  specified  for  reliability  and  avail¬ 
ability. 

Reliability  is  important  because  of  the 
threat  of  catastrophic  loss  implicit  in  func¬ 
tional  failure  of  any  part  of  a  weapon  system 
in  fleet  service. 

Availability  is  of  concern  when  repair  of 
failures  can  be  achieved  during  a  mission 
phase.  The  randomness  of  demands  on  sys¬ 
tems  in  service  often  permits  repair  of  failure 
without  degradation  of  mission  performance. 
Degradation  in  performance  occurs  only  when 
a  demand  occurs  at  a  time  when  the  system 
is  down  for  repair  or  undergoing  preventive 
maintenance  (including  calibration). 

SSPO  requires  timely  and  accurate  evalua¬ 
tion  of  reliability  and  availability,  beginning 
in  the  development  phase  of  weapon  system 
procurement,  as  an  input  to  decision  making 
and  program  control.  The  evaluation  process 
as  defined  by  this  document  involves  appor¬ 
tionments  and  predictions  followed  by  mea¬ 
surement  and  is  supported  by  a  variety  of 
analyses  (e.g.,  mission  analysis;  system  anal¬ 
ysis;  analysis  of  the  integrated  test  program; 
failure  mode,  effects  and  criticality  analysis). 
The  principal  purpose  of  apportionments  is 
to  develop  reliability  and  availability  design 
criteria  for  system  elements.  The  principal 
purpose  of  predictions  is  to  provide  periodic 
forecasts  of  the  reliability  and  availability  of 
the  projected  (final)  design.  Apportionments 
and  predictions  are  followed  by  objective 
measurement  of  reliability  and  availability 
beginning  when  the  earliest  fabricated  units 


become  available  for  testing.  The  total  life 
history  of  development  hardware  is  observed. 
A  data  system  is  used  to  record  development 
history  and  to  facilitate  computation  of  point 
and  interval  estimates  of  reliability  and  avail¬ 
ability.  Often,  early  test  data  are  derived  from 
tests  of  equipments  or  other  subordinate 
assembly  levels.  System  estimates  are  syn¬ 
thesized  by  means  of  a  mathematical  model. 
Later,  these  are  supplemented  by  tests  of 
larger  assemblies  and  the  system.  Use  of  a 
system  mathematical  model  takes  advantage 
of  the  variety  of  tests  performed  at  various 
assembly  levels  during  development  Alterna¬ 
tively,  test  data  can  be  taken  only  at  system 
level.  In  either  case  the  data  system  provides 
for  largely  automated  preparation  and  updat¬ 
ing  of  numerical  reports  in  standardized 
formats  suited  to  the  needs  of  program  man¬ 
agement.  At  all  stages  of  the  evaluation  (pre¬ 
diction  and  measurement),  results  are  com¬ 
pared  to  specified  requirements  to  determine 
if  corrective  action  is  required. 

2.2  NATURE  OF  RELIABILITY  AND 
AVAILABILITY 

2.2.1  Nature  of  Availability 

Availability  is  a  dimensionless  number  de¬ 
fined  on  the  interval  10,  11.  Availability  is 
defined  as  the  probability  that  a  system  will 
be  up  or  operable  when  called  upon  during 
a  mission.  This  definition  implies  the  need  for 
analysis  of  both  the  system  and  its  mission  in 
quantifying  availability.  The  FBMWS/SWS 
TOGs  have  an  operational  readiness  reliability 
objective.  Availability  is  equivalent  to  Oper¬ 
ational  Readiness  Reliability  in  this  manual 
It  is  an  appropriate  index  during  the  opera¬ 
tional  readiness  portion  of  the  FBMWS/SWS 
mission. 
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A  system  is  down  when  it  is  not  usable  be¬ 
cause  it  is  undergoing  maintenance  [corrective 
or  preventive  (including  calibration)!;  it  is 
available  only  when  it  is  up  with  respect  to 
both  corrective  and  preventive  maintenance. 
Thus,  system  availability  is  equal  to  or  greater 
than  the  product  of  its  availability  with  re¬ 
spect  to  failure  (corrective  maintenance,  A^) 
and  its  availability  with  respect  to  preventive 
maintenance,  Ap . 


The  availability  of  a  system  over  a  period 
of  time,  such  as  a  submarine  patrol  of  dura¬ 
tion  T,  is  known  as  interval  availability  and  is 
given  by  equation  2-3. 


At 


JL.  +  * 

\+fi  (X+/i)2T 


1  _e-(A.nOT 


(2-3) 


where 


\>  \  -  A,,  (2-1) 

Availability  may  be  greater  than  the  pro¬ 
duct  because  of  dependencies  between  pre¬ 
ventive  and  corrective  maintenance  and  be¬ 
cause  preventive  maintenance  may  often  be 
shortened  or  postponed  when  demand  oc¬ 
curs.  Because  of  these  aspects,  this  manual 
will  treat  availability  only  from  a  corrective 
maintenance  viewpoint.  The  subscript  c  in  A,, 
will  be  dropped.  Availability  must  be  repre¬ 
sented  by  an  equation  embodying  appropriate 
assumptions  as  to  the  failure  and  maintenance 
processes  applicable  to  the  system  under  mis¬ 
sion  conditions. 

Many  different  availability  indices  can  be 
derived,  because  the  functions  and  require¬ 
ments  of  systems  differ  widely;  thus  no  single 
index  or  figure-of-merit  can  meaningfully 
represent  availability  for  all  systems!  1 1 . 
Moreover,  availability  may  depend  on  the  care 
and  skill  with  which  a  system  is  transported, 
operated  and  maintained,  as  well  as  on  the 
system  design.  However,  in  this  manual,  use 
of  the  term  is  limited  to  measures  which  are 
primarily  properties  of  system  design.  This 
manual  assumes  an  adequate  number  of 
spares,  etc.  for  system  availability  and  does 
not  address  measures  of  availability  which  are 
primarily  reflections  of  logistic  support, 
spares  provisioning  and  similar  factors. 

The  availability.  A,  of  a  system  is  estimated 
by  a  function  of  mean  time  between  failure 
(MTBF)  and  mean  time  to  repair  (MTTR), 


MTBF 

MTBF  +  MTTR 


(2-2) 


When  the  system  has  exponential  failure 
and  repair  rates,  X  =  1/MTBF  and  p  =  1/ 
MTTR  This  substitution  leads  directly  to 
equation  2-4. 


p  is  the  system  repair  rate. 

X  is  the  system  failure  rate. 

Equation  2-3  includes  a  steady-state  term 
and  a  transient  term. 

As  T,  the  mission  duration,  is  increase  the 
interval  availability  approaches  a  constant 
value,  designated  steady-state  availability, 
which  is  written: 


A  = 


lint  \  =  — — 
T-wo  X+/i 


(2-4) 


For  missions  of  sufficient  length,  the  trans¬ 
ient  term  in  equation  2-3  is  ignored  and 
steady-state  availability  is  taken  as  the  appli¬ 
cable  measure.  Throughout  this  manual  it 
is  assumed  that  the  relevant  mission  is  long 
enough  to  justify  the  use  of  steady -state 
availability.  Any  error  stemming  from  this 
assumption  will  tend  to  render  the  analysis 
conservative;  that  is,  interval  availability  is 
always  greater  than  steady-state  availability  . 

For  many  systems,  immediate  failure 
detection  and  repair  are  not  achievable  for  all 
failure  modes.  Some  modes  of  failure  will  be 
automatically  detected  and  alarmed,  others 
will  be  periodically  detected  during  the  pa¬ 
trol,  and  still  others  will  not  be  detectable 
during  a  patrol.  This  fact  gives  rise  to  the 
concepts  of  actual  and  apparent  availability 
Apparent  availability  is  a  measure  of  the  ob¬ 
served  status.  Actual  availability  is  an  index 
which  accounts  for  the  undetected  failures 
by  predicting  the  number  of  failures  that  exist 
but  will  not  be  detected.  This  index  is  of 
primary  use  to  system  planners,  permitting 
them  to  perform  trade-off  studies  leading  to 
extra  failure  detection  capability.  Before 
using  any  availability  equation,  the  analyst 
must  determine  that  failure  detection  and  re¬ 
pair  is  a  viable  option.  If  it  is  not,  reliability 
is  the  appropriate  measure. 
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2.2.2  Dependence  of  System  Availability  on 
the  Reliability  and  Maintainability 
of  System  Elements 

The  availability  exhibited  by  a  system  dur¬ 
ing  periods  of  stated  length  is  a  statistically 
distributed  variable.  Its  distribution  is  jointly 
determined  by  the  system's  reliability  and 
maintainability,  which  in  turn  depend  on  the 
corresponding  properties  of  the  equipments 
that  constitute  the  system. 

Reliability  determines  the  frequency  of  un¬ 
scheduled  downtimes;  maintainability  deter¬ 
mines  their  durations  and  the  duration  of 
downtimes  for  other  purposes  (e  g.,  scheduled 
preventative  maintenance  and  calibration)  In 
its  initial  configuration  the  system  will  possess 
certain  inherent  levels  of  reliability  and  main¬ 


tainability  which  together  establish  an  upper 
limit  on  the  availability  attainable  by  the  sys¬ 
tem  Beyond  this  limit,  additional  effort  can 
be  applied  to  improve  item  reliability  or 
maintainability  or  both,  in  order  to  increase 
availability.  Figure  2-1  illustrates  the  de¬ 
pendence  of  availability  on  reliability  and 
maintainability.  The  relative  impacts  of 
reliability  and  maintainability  upon  avail¬ 
ability  are  often  studied  in  specific  situations 
and  the  results  used  in  trade-offs  and  the  set¬ 
ting  of  RMA  specifications. 

2.2.3  Nature  of  Reliability 

Reliability  is  defined  as  the  probability  o! 
functioning  without  failure  during  a  spec  d  ied 
mission  or  a  portion  of  a  mission.  It  is  an 


Figure  2-1.  Dependence  of  Availability  (A)  Upon  Reliability  (MTBF)  and  Maintainability  (MTTR) 


2-3 


NAVSEA  OD  29304B 


index  appropriate  to  the  launch  and  flight 
phases  of  the  FBMWS/SWS  mission.  Relia¬ 
bility  also  applies  to  any  phase  where  con¬ 
tinuous  or  standby  operation  is  required  and 
repair  is  not  possible.  The  definition  pre¬ 
supposes  explicit  definition  of  a  nominal  de¬ 
sign  mission  or  range  of  possible  missions, 
and  also  assumes  explicit  success  criteria,  a 
particular  subset  of  hardware  and  software 
functions  necessary  to  the  success  of  the 
primary  mission 

Reliability  is  expressed  as  a  dimensionless 
real  number  in  the  interval  [0,  II,  where  zero 
represents  certainty  of  failure  and  unity  repre¬ 
sents  certainty  of  success.  A  general  mathe¬ 
matical  expression  for  the  reliability  of  time 
dependent  devices  is: 


where  1  is  mission  duration,  R(0)  =  1  and 
h( 1 1  is  the  --o-vdiled  hazard  rate  or  instan¬ 
taneous  rate  of  failure  While  h(ti  is  formally 
time  dependent,  both  theory  and  experience 
support  the  supposition  that  for  complex 
systems  the  rate  can  often  be  approximated 
by  a  constant  after  an  initial  early  mortality 
period  and  prior  to  wear  out  |  2-Chapters  4 
and  5]  This  characteristic  shape  of  the  haz¬ 


ard  function  has  given  rise  to  the  familiar 
“bathtub  curve”,  figure  2-2. 

Figure  2-2  is  a  general  portrayal  of  time- 
varying  reliability  characteristics  of  equip¬ 
ment  through  their  life  cycles.  Good  reli¬ 
ability  practice  is  to  operate  equipment  only 
during  its  useful  life  period,  between  the  so- 
called  green-time  line,  TG  ,  and  red-time  line, 
TR  .  In  this  fiat  portion  of  the  curve  h(t)  is 
called  failure  rate  and  is  designated  by  the 
symbol  X;  its  reciprocal,  1/X,  is  the  mean  time 
between  failure,  MTBF.  It  can  be  shown  that 
the  probability  density  of  time  to  failure  in 
this  region,  fit)  -  the  unconditional  instan¬ 
taneous  rate  of  failure  -  is  described  by  the 
exponential  distribution,  a  one-parameter 
function  fully  defined  by  the  single  param¬ 
eter  X.  An  equivalent  expression  for  reliability 
in  terms  of  fit)  is: 

R(T)  =  /T"  find!  (2-6) 

which,  fez  the  exponential  case,  is 

R(t)  =  e~M  (2-7) 

Clearly  evaluation  of  X  or  MTBF  is  equiva¬ 
lent  to  evaluation  of  the  complete  reliability 
function  over  the  entire  range  of  t.  wherever 
the  exponential  model  applies. 


operating  time 


Figure  2-2.  Hazard  Rate  as  a  Function  of  Operating  Time 
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If  a  system  remains  in  service  beyond  I  R  . 
its  failure  rate  increases  rapidly  until  the  sy  s¬ 
tem  can  no  longer  be  supported  economically 
by  available  spares  and  maintenance  facilities. 

When  hazard  rates  are  time-varying,  as  is 
often  true  of  simple  devices,  the  resulting 
failure  distributions  and  reliability  functions 
are  generally  much  more  complex  than  in  the 
exponential  case  [3-Chapter  21  .  For  such 
devices,  definition  of  a  complete  reliability 
function  may  require  that  two  or  three 
parameters  be  given,  although  a  discrete  value 
of  R(t)  may  always  be  defined  directly  for  a 
mission  of  given  length. 

A  number  of  researchers  in  the  reliability 
field  [4-pages  402-407  for  example]  have  sug¬ 
gested  that  the  advent  of  semiconductor  de¬ 
vices  has  reduced  equipment  failure  rates  and 
lengthened  their  useful  life  beyond  the  nor¬ 
mal  use  period  for  a  system.  In  effect,  these 
researchers  claim  that  the  failure  character¬ 
istics  of  electronic  equipment  have  changed 
and  that  present  day  electronic  equipment  has 
a  decreasing  failure  rate  (DFR).  It  has  been 
shown  [5-pages  375-383]  that  mixtures  of 
exponentials  have  a  DFR. 

Consider  for  example  two  components 
with  constant  failure  rates  X,  and  X2  where 
X,  >  X2  One  thousand  of  these  components 
are  put  in  service  (500  of  each  type).  Initially, 
the  failure  rate  is  (X,  +  X2 )/2  but  after  500 
failures  (without  replacement)  the  failure 
rate  of  the  surviving  population  would  be 
lower  since  the  expected  number  surviving 
from  X2  population  is  higher  than  the  ex¬ 
pected  number  surviving  from  the  X[  popu¬ 
lation.  The  mixture  has  a  DFR. 

An  important  consideration  with  DFR 
devices  is  when  to  put  them  into  service. 
When  the  failure  rate  is  decreasing  rapidly 
(e  g.,  as  in  the  bum-in  period  of  figure  2-2)  a 
minimum  bum-in  period  should  be  required 
and  service  use  should  begin  after  T(; .  Even 
with  DFR,  service  use  may  be  permitted  when 
the  hazard  rate  reaches  an  acceptably  low 
level.  Of  course  if  screening  or  bum-in  is  eco¬ 
nomical  it  may  be  cost  effective  to  perform 
more  screening,  etc.  prior  to  use.  Figure 
2-2  may  still  illustrate  this  process  with  minor 
changes.  The  useful  life  period  would  have  a 
decreasing  slope  and  TR  is  not  reached  during 
the  useful  life  of  the  system. 


A  curve  similar  to  figure  2-2  applies  for 
one-shot  devices.  In  this  case  the  reliability  is 

R  =  1  (2-8) 

n 

where 

x  is  the  number  of  failures 

n  is  the  sample  size 

The  primary  reason  for  the  decrease  prior 
to  T(;  in  figures  2-2  and  2-3  is  that  manufac¬ 
turing  defects  are  being  eliminated  by  test  and 
inspection.  The  increase  after  TR  is  due  to 
phenomena  sucli  as  mechanical  wear  and 
chemical  deterioration.  Maintenance  should 
be  performed  when  an  item  reaches  TR 

The  hazard  rate  for  equipment  which  oper¬ 
ates  continuously  or  periodically,  such  as 
electronic  equipment  or  jet  engines  is  most 
often  given  as  a  failure  rate,  X,  and  is  ex¬ 
pressed  in  failures  per  million  hours  or  failures 
per  mission.  When  X  is  constant.  MTBF  =  1/X. 
The  hazard  rate  for  one-shot  devices  is  a  deci¬ 
mal  value  between  0  and  1  and  is  usually  esti¬ 
mated  by  the  ratio  of  total  failures  to  total 
trials. 

2.2.4  Nature  of  Maintainability 

Maintainability  is  a  characteristic  of  system 
design  and  installation  Mean  preventive  main¬ 
tenance  time,  mean  repair  time,  and  mean 
down  time  are  commonly  used  indices  of 
maintainability. 

In  this  manual,  assessment  of  maintenance 
actions  will  be  limited  to  actions  involved  in 
corrective  maintenance  (reference  corrective 
maintenance  time  figure  2-4)  that  is,  modifi¬ 
cation  time,  delay  time  and  preventive  main¬ 
tenance  down  time  will  be  excluded  from  esti¬ 
mates  of  mean  time  to  repair. 

2.3  THE  SPECIFICATION  OF  RM  A 
REQUIREMENTS 

MIL-STD-490|6)  provides  general  guide¬ 
lines  for  the  preparation  of  specifications 
The  specification  addressed  in  this  manual  is 
the  specification  of  meaningful  numerical 
RMA  requirements  to  guide  design  and  pro¬ 
curement  activities  and  to  facilitiate  evalu¬ 
ation  of  the  system  as  development  pro¬ 
gresses. 


! 
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Figure  2*3.  Failure  Probability  Curve  as  a  Function  of  Age  of  One-Shot  Devices 


Figure  2-4.  Time  Relationships 
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An  explicit  definition  in  engineering  terms 
of  the  intended  mission  or  range  of  mismms 
a ra I  the  environments  assov tated  with  them  is 
essential  to  proper  specification  of  reliability 
and  avail jh-dity  at  every  assembly  level.  The 
assumption  that  reliability  and  availability 
.  haras  (eristics  are  system  parameters  inde¬ 
pendent  ot  rassmii  ••  an. able s  is  a  common 
error  in  preparing  specifications.  Reliability 
and  availability  depend  critically  on  the  mis¬ 
sion.  Both  the  numerical  values  and  the 
functional  forms  of  the  applicable  indices  de¬ 
pend  on  the  mission.  S  qually  as  necessary  as 
mission  definition  is  a  numerical  requirement 
applicable  to  the  weapon  system  or  other  top- 
level  procurement  item,  it  is  the  requirement 
from  which  system  reliability  and  availability 
requirements  must  be  derived,  and  to  which 
subsequent  evaluation  results  must  be  related. 

It  is  normal  practice  to  establish  quantita¬ 
tive  requirements  for  each  phase  of  a  tactical 
mission  Consider,  for  example,  a  submarine 
that  goes  to  sea  on  periodic  patrols  and  will 
fire  missiles  at  assigned  targets,  only  if  such 
action  is  ordered.  If  no  missiles  are  fired  on  a 
given  patrol,  the  operational  readiness  phase 
represents  the  entire  mission.  If,  however,  one 
or  more  missiles  are  fired  during  the  patrol, 
the  mission  consists  of  three  phases,  namely 
operational  readiness,  launch  and  flight.  In 
this  example  quantitative  requirements  would 
be  established  for  the  three  tactical  phases, 
as  indicated  below: 


Operational 

Readiness 

Reliability 

Launch 

Reliability 

Flight 

Reliability 

Ror 

Rl 

Rt 

When  mission  phase  reliabilities  are  statistical¬ 
ly  independent,  the  system  reliability  require¬ 
ment,  Rj  is: 

Rs*  s  Ror  rl  R|  =  A*  Rl  R,*  (2-9) 

since  by  definition  in  this  manual  A*  =  R^r  . 

When  they  are  not  independent,  methods 
such  as  those  covered  in  §  4. 2. 2. 1.2.4  are 
required. 


1  lie  superscript  symbol  i*i.  to  be  read 
star.  affixed  to  each  symbol  identifies  a  re¬ 
quirement  or  objective  accordingly  .  the  sys¬ 
tem  development  program  must  be  tauoied 
to  meet  ot  exceed  each  specified  tn. met  real 
r  quirement. 

The  logistic  phase  of  the  mission  consists 
of  the  transportation,  handling  and  storage, 
mst jl’.ition  and  Jiccs  ml  th  ■'  tas  -  -g'j..- 
prior  to  tlie  mission  0nd  any  necessary  refit 
Numerical  requirements  ate  n  -t  n-  uTtunK 
provided  for  the  logistic  phase  The  system 
development  program  must  con>. jet  the  logis¬ 
tic  phase  arid  insure  that  the'  sy  stem  is  nut  de¬ 
graded  by  the  logistic  environment 

Availability  objectives  may  be  assigned  to 
certain  support  equipment  used  m  the  pro¬ 
cessing.  test  and  inspection  portion  ->f  the 
logistics  phase  to  assure  an  adequate  Il<  ^v  of 
ready  for  issue  equipment  to  the  fleet. 

2.3.1  Specification  for  Availability 

It  is  not  only  possible,  but  frequently  de¬ 
sirable.  to  establish  a  single  quantitative 
requirement  for  availability,  designated  herein 
as  A*.  Its  value  must  necessarily  lie  in  the 
interval  0  to  1 .  For  tactical  reasons  it  should 
be  as  close  to  1  as  cost,  life  cycle  support  re¬ 
quirements  and  state-of-the-art  permit.  The 
value  selected  for  A*  will  depend  upon  a 
variety  of  factors  such  as  tactical  mission 
needs,  anticipated  complexity  of  the  system, 
expected  reliabilities  of  system  elements, 
maintenance  policy  (e.g.,  what  equipment 
will  be  removed  and  replaced  during  the 
mission  if  failure  occurs,  spares  stocking 
points,  etc.),  capabilities  of  both  diagnostic 
equipment  and  operators  in  fault  isolation, 
and  consequences  of  non-spareable  equipment 
failures.  The  selection  of  the  value  for  A* 
should,  therefore,  be  done  only  after  all  rele¬ 
vant  factors,  such  as  those  cited  above,  are 
analyzed  and  trade-off  decisions  are  made. 

Specifications  of  a  single  value  of  A'  pro¬ 
vides  a  wide  trade-off  region  for  reliability 
(MTBF)  and  maintainability  (MTTR)  In 
many  cases  it  is  necessary  to  constrain  this 
choice  by  specifying  a  minimum  value  of 
MTBF  or  a  maximum  value  of  MTTR.  The  ad¬ 
vantage  of  including  constraints  on  reliability 
or  maintainability  is  that  they  preclude  un¬ 
desirable  system  trade-offs  (e.g.,  a  computer 
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with  an  MTBF  of  2  hours  and  an  MTTR  of  .  1 
hours  would  have  an  availability  of  0.95  but 
the  probability  of  successfully  running  a  pro¬ 
gram  requiring  2  hours  would  be  less  than  37 
percent)  As  shown  in  figure  2-1  specifying 
any  two  of  the  RMA  parameters  fixes  the 
third  (e  g.,  if  MTBF  =  62.5  hours  and  MTTR 
=  1  hour,  A  must  be  .984). 

2.3.2  Specification  for  Reliability 

Reliability  may  be  specified  as  a  constraint 
on  availability  as  indicated  in  §  2.3.1.  How¬ 
ever,  some  mission  phases  do  not  permit  re¬ 
pair.  In  these  cases  reliability  must  be  speci¬ 
fied  as  it  is  the  only  RMA  parameter  of  in¬ 
terest. 

As  in  the  case  of  availability,  a  single  quan¬ 
titative  requirement  for  reliability  designated 
herein  as  R*  can  be  established.  The  specifi¬ 
cation  of  design  requirements  or  objectives 
is  intended  to  guide  the  design  effort.  Reli¬ 
ability  prediction  such  as  prescribed  by  MIL- 
STD-756 ( 7]  is  often  used  as  a  measure  of 
compliance. 

2.3. 3i  Specification  for  Maintainability 

Frequently  quantitative  values  are  estab¬ 
lished  for  system  maintainability  (M)  via 
equation  (2-2)  given  previously  for  avail¬ 
ability  (A).  When  this  equation  is  solved  for 
mean-time-to-repair  (MTTR),  we  have 

MTTR  =  x  MTBF  (2-10) 

A 

Consequently ,  if  both  availability  and  reli¬ 
ability  are  specified,  maintainability  (MTTR) 
is  also  specified  As  previously  indicated, 
there  are  numerous  combinations  of  R  and 
M  that  can  be  chosen  to  meet  availability  re¬ 
quirements.  The  one  that  is  chosen  should  be 
one  that  is  an  optimum  when  all  principal 
relevant  factors  are  considered  such  as  R  and 
M  program  costs,  state-of-the-art,  and  main¬ 
tenance  policy. 

In  addition  to  specifying  MTTR,  it  is  often 
desirable  to  also  specify  a  maximum  time  to 
repair.  This  maximum  repair  time  is  that 
value  below  which  a  specified  percent  of  all 
corrective  maintenance  tasks  should  be  com¬ 
pleted.  It  is  customary  for  this  value  to  be 
synonymous  with  the  95th  percentile  point  in 


the  distribution  of  corrective  maintenance 
down  times.  It  is  often  desirable  to  specify 
diagnostic  requirements  (e.g.,  fault  detection 
capability,  fault  isolation  time)  to  restrict  the 
R&M  trade  off  region. 

2.3.4  Demonstration  Requirements 

The  discussion  of  demonstration  is  limited 
to  reliability  demonstration  in  this  manual  A 
maintainability  demonstration  would  be  de¬ 
signed  to  prove,  by  test,  that  the  actual 
MTTR  is  less  than  the  specified  value 
(MTTR*)  and  that  95 %  of  the  repair  times 
are  less  than  the  specified  maximum  repair 
time  [8].  An  availability  demonstration 
could  directly  demonstrate  that  the  A*  re¬ 
quirement  is  met  or  it  could  measure  the  reli¬ 
ability  and  maintainability  parameters  and  de¬ 
termine  the  achieved  availability. 

Demonstration  tests  are  formal  tests  de¬ 
signed  to  accept  or  reject  the  hypothesis  that 
the  design  meets  the  requirement.  The  actual 
reliability  achieved  by  a  system  cannot  be 
measured  as  a  point  on  the  probability  scale, 
with  positive  statistical  confidence  during 
the  system’s  life.  Enforcement  of  the  speci¬ 
fication  is,  therefore,  facilitated  by  providing 
suitable  tests  to  demonstrate  compliance  with 
its  requirements. 

In  this  manual  we  identify  and  illustrate 
the  application  of  several  principal  types  of 
reliability  demonstrations  in  Section  8. 

A  complete  reliability  specification  is 
shown  in  figure  2-5. 


Quantities  Specified 

Design 

Objective 

Producer’s 

Risk 

Minimum 

Reliability 

Consumer's 

Risk 

R* 

a 

■<L* 

0 

Figure  2-5.  Complete  Reliability  Specification 


R*  is  the  design  requirement,  that  is  the 
reliability  which  must  be  exhibited  in  oper¬ 
ational  use.  The  contractor  should  conduct 
his  design  program  to  meet  this  requirement. 

a  is  the  producer's  risk,  the  probability 
that  a  design  with  reliability  of  R*  will  fail 
the  demonstration  test. 

Rl  is  a  demonstration  test  parameter,  com¬ 
monly  referred  to  as  the  minimum  acceptable 
reliability. 
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/)  is  the  consumer’s  risk.  It  is  the  proba¬ 
bility  that  a  design  with  reliability  of  Rl  will 
pass  the  demonstration  test. 

The  decimal  values  assigned  to  the  R*  and 
R£  in  figure  2-5  are  converted  to  MTBF 
values  60  and  6 , ,  respectively  when  a  demon¬ 
stration  test  plan  from  MIL-STD-781  ( 9  ]  is 
to  be  used. 

In  those  cases  where  tire  mission  of  a 
weapon  system  includes  more  than  one  phase, 
it  may  not  always  be  necessary  to  establish  R 
demonstration  requirements  for  the  product 
for  each  mission  phase.  In  these  instances,  a 
possible  approach  is  to  select  the  mission 
phase  which  has  the  most  stringent  require¬ 
ments  for  the  product  and  demonstrate  that 
the  product  meets  these  most  stringent 
requirements. 
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Section  3 

ESTABLISHMENT  AND  MANAGEMENT  OF  A  RELIABILITY 
AND  AVAILABILITY  EVALUATION  PROGRAM 


3.1  BASIC  REQUIREMENTS 

One  of  the  inaior  tasks  facing  the  Strategic 
Systems  Project  (Office  tSSPO)  is  the  estab¬ 
lishment  of  system  rcquin.  meats  Among  the 
important  quantitative  system  requirements 
are  those  established  for  reliability  ,  maintain 
ability,  and  availability  (RMA.! 

The  Department  of  Defense  ami  the  Navy 
Department  have  recognized  the  need  for 
properly  specifying  both  quantitative  and 
qualitative  RMA  requirements  for  many 
years.  Recent  high-level  documents  support 
this  need,  e.g.: 

DoD  Directive  5000.3  !  1 1 
DoD  Directive  5000.40  Cl 
SECNAV1NST  3900.3b  [31 
NAVMAT1NST  3000.1  [41 
NAVMAT1NST  5430.53  15] 

CNM  Procurement  Policy  Memorandum 
#15  [61 

NAVMAT  09H  Guideline  Policy  #1  (7) 
NAVMAT  09 H  Guideline  Policy  #2  [8] 
NAVMAT  09 H  Guideline  Policy  #3  [91 

SSPO  has  and  will  continue  to  emphasize 
the  importance  of  properly  specifying  quanti¬ 
tative  and  evaluation  program  requirements 
for  RMA  in  its  weapon  system  procurements. 
RMA  evaluation  program  requirements  are 
tailored  to  each  specific  procurement  based 
on  the  Technical  Objectives  and  Guidelines 
Document  (TOG)  and  the  checklist  (Appen¬ 
dix  B)  of  NAVSEA  OD  215491  10] . 

3.1.1  Review  of  Basic  Requirements 

One  of  the  first  steps  that  the  contractor 
should  take  in  planning  the  RMA  evaluation 
program  is  to  review  contractual  requirements 
invoked  via  the  completed  checklist  (Appen¬ 
dix  B)  of  NAVSEA  OD  21549(10],  This 
checklist  will  not  only  indicate  which  RMA 


evaluation  program  elements  are  invoked  or. 
the  contractor,  but  also  what  kind  of  modifi¬ 
cations.  if  any  ,  have  been  made  to  the  basis 
requirements.  The  purpose  of  this  review  is  to 
establish  the  scope  of  the  evaluation  program 
flic  review  must  also  include  the  vxaininat".  n 
of  contractual  quantitative  RMA  require¬ 
ments  derived  from  the  TOG  and  invoked  by 
contractual  specifications  to  establish  the 
degree  of  difficulty  anticipated  in  meeting 
These  requirements. 

3  1.2  Review  of  Company  Policies 

Contractor  management  must  review  com¬ 
pany  policies,  procedures,  guidelines  and 
operating  instructions  in  light  of  the  review 
conducted  in  §  3.1.1.  Necessary  modifica¬ 
tions.  additions,  and  updates  should  be  under¬ 
taken  for  this  program. 

3  2  ELEMENTS  OF  RMA  PROGRAM 
PLANNING  AND  MANAGEMENT 

The  following  RMA  program  elements  are 
required  by  NAVSEA  OD  2 1 549]  1 01  . 

3.2.1  Management  Policy 

The  contractor  must  establish  and  main¬ 
tain  a  documented  policy  for  fulfilling  con¬ 
tractual  RMA  requirements.  Statements  of 
policy  shall  form  the  basic  guidelines  and  the 
internal  company  authority  for  developing 
and  implementing  the  RMA  program.  Specific- 
responsibilities  must  be  assigned  and  action 
authorities  clearly  delineated.  Personnel  per¬ 
forming  RMA  evaluation  functions  must 
have  sufficient,  well  defined  responsibility, 
authority  and  organizational  independence  to 
fulfill  specified  requirements,  to  identify  and 
evaluate  RMA  problems  and  to  accomplish 
and  verify  corrective  action. 
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3.2.2  Program  Planning 

Planning  of  the  RMA  evaluation  program 
must  be  initiated  at  the  earliest  feasible 
moment  to  assure  that  methods  and  controls 
necessary  for  fulfilling  specified  requirements 
are  developed  in  advance  of  the  necessity  for 
their  implementation  and  maintained  as  nec¬ 
essary  throughout  applicable  life-cycle  phases. 
The  program  should  use  established  proced¬ 
ures  and  instructions,  augmented  as  neces¬ 
sary,  to  meet  RMA  requirements.  These  pro¬ 
cedures  and  instructions  must  be  complete 
and  concise,  of  a  ty  pe  appropriate  for  the 
functions  to  be  controlled,  define  the  respons¬ 
ibilities.  and  provide  the  methods  and  criteria 
for  performance. 

Planning  should: 

•  demonstrate  an  awareness,  recognition, 
and  organized  approach  to  the  achievement  of 
RMA  requirements. 

•  assure  that  adequate  controls  are  main¬ 
tained  throughout  all  phases  of  contract  per¬ 
formance. 

•  provide  for  smooth  transition  of  the 
RMA  program  throughout  all  phases  of  con¬ 
tract  performance. 

•  provide  objective  evidence  of  the  effec¬ 
tive  implementation  and  operation  of  the 
RMA  program. 

3.2.3  Program  Plan  Matrix 

The  contractor  shall  prepare  a  Program 
Plan  Matrix  (PPM)  to  indicate  the  means  of 
complying  with  specified  RMA  evaluation 
requirements.  The  PPM  requirements  for 
evaluation  are  the  following.  Identify,  by 
paragraph  number  of  NAVSEA  OD 
2I549[10|,  the  documents  that  satisfy  each 
RMA  requirement  and  the  organization  that 
has  primary  responsibility  for  implementa¬ 
tion.  When  documents  are  not  available  or  are 
inadequate  for  satisfying  specific  require¬ 
ments,  additional  documents  required  should 
be  identified  and  an  estimated  completion 
date  specified.  The  PPM  should  be  maintained 
to  reflect  current  documentation  and  organi¬ 
zational  responsibilities.  In  addition,  the  PPM 
should  include  milestones  and  schedules  for 
accomplishing  each  RMA  requirement. 


3.2.4  Integrated  Data  System 

The  contractor  shall  establish  and  main¬ 
tain  a  system  for  the  effective  collection, 
control,  processing  and  use  of  data  generated 
to  support:  design  engineering,  the  quality, 
reliability  and  maintainability  programs;  the 
subsystem  safety  program  and  the  corrective 
action  system.  The  collection,  processing, 
storage,  maintenance,  retrieval,  control  and 
distribution  procedures  should  be  designed  to 
meet  these  detailed  data  needs.  The  system 
should  be  designed  to  assure  that  records  of 
similar  or  related  data  elements  from  various 
contractor  internal  functional  areas,  sub¬ 
contractors  and  other  external  sources  are 
compatible  for  the  purpose  of  retrieval  and 
analysis.  The  supporting  documentation 
should  include: 

•  A  list  of  logs,  forms,  and  other  media 
used  to  record  data,  along  with  the  descrip¬ 
tion  and  storage  location  of  these  documents. 

•  A  list  and  description  of  output  reports, 
indicating  the  preparing  organization  and 
periodicity. 

•  A  tabulation  of  applicable  procedures 
that  describe  the  preparation  and  flow  of 
input  data  and  output  reports. 

Additional  details  are  provided  later  in  this 
manual  on  data  needs  for  RMA. 

3.2.5  Corrective  Action  System 

The  contractor  shall  establish  and  maintain 
a  system  for  corrective  action  of  problems/ 
failures.  The  system  shall  include:  reporting 
of  problems/failures,  investigation,  analysis, 
and  performance  of  actions  to  correct  prob¬ 
lems/failures  and  preclude  recurrence.  The 
system  should  use  problem/failure  data  from 
tests  and  inspections  throughout  the  contract 
effort.  Reporting  of  problem/failure  data  and 
corrective  actions  should  be  in  a  form  to  as¬ 
sure  smooth  transition  and  integration 
through  the  various  phases  of  program  per¬ 
formance. 

A  key  element  of  a  development  program 
is  effective  corrective  action  (CA).  Repair 
and  maintenance  actions  are  considered  dis¬ 
position  CA  since  they  only  affect  the  failed 
item.  Effective  CA  will  eliminate  or  reduce 
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future  item  rejections  and  failures.  C A  may 
apply  to  both  relevant/non-relevant  as  well  as 
verified/not  verified  (false  alarm)  failures. 
(Note:  non-relevant  and  not  verified  failures 
also  force  equipment  out  of  service.) 

Figure  3-1  shows  the  elements  of  a  simpli¬ 
fied  corrective  action  system,  and  the  manner 
in  which  it  provides  feedback  to  manufactur¬ 
ing  and  R&R  activities  to  stimulate  reliability 
growth. 

CA  usually  cannot  be  determined  at  the 
time  the  failure  is  observed.  It  may  not  even 
be  determinable  after  verification  testing  and 
teardown.  Detailed  failure  analysis  is  usually 
required  to  develop  effective  CA  This  in¬ 
cludes  a  review  of  all  data  (observations,  test 
findings,  lab  analysis,  etc.)  to  make  a  deter¬ 
mination  as  to  the  cause  of  failure;  then 
effective  CA  can  be  developed  and  approved 
through  the  corrective  action  board  (CAB). 
The  results  of  the  failure  analysis  should  be 
documented  in  a  comprehensive  failure  anal¬ 
ysis  report  which  is  conclusionary  in  nature 
(i.e.,  specifies  the  results  of  the  analysis,  the 
cause  of  failure,  CA  implemented,  implemen¬ 
tation  data,  etc  ). 

The  results  of  the  failure  reports  are  then 
used  to  measure  the  effectiveness  of  correc¬ 
tive  actions.  Failure  concurrence  with  CA 
implemented,  can  be  plotted  as  a  function  of 
time  to  determine  CA  effectiveness. 

3.2.6  Documentation 

The  contractor  shall  develop  and  maintain 
those  documents  necessary  to  fulfill  specified 
RMA  requirements.  A  system  for  scheduling 
and  monitoring  the  preparation  of  these  docu¬ 
ments  should  be  maintained  to  assure  that 
preparation  is  timely  in  relation  to  program 
milestones. 

3.2.7  Audits 

The  contractor  shall  audit  his  RMA  pro¬ 
gram  periodically  to  determine  compliance 
with  specified  RMA  requirements.  Audits 
should  be  planned  to  begin  at  the  start  of  the 
contract  and  should  be  conducted  throughout 
the  life  of  the  contract.  Planning  should  con¬ 
sider  program  milestones  and  delineate  cri¬ 
teria  for  audit  scope  and  frequency.  Audits 
should  be  performed  by  an  independent  audit 


group  or  by  personnel  not  having  responsibil¬ 
ities  in  the  area  being  audited.  Results  of  each 
audit  should  be  documented  in  a  report  to 
appropriate  managers  and  supervisors.  Action 
should  be  taken  to  assure  timely  correction  of 
deficiencies  and  follow-up  performed  to  ver¬ 
ify  effectiveness  of  corrective  action. 

3.2.8  Configuration  Management 

The  contractor  shall  establish  and  imple¬ 
ment  a  program  for  configuration  manage¬ 
ment  of  deliverable  hardware  and  software 
(including  training  and  special  test  and  inspec¬ 
tion  equipment  with  concomitant  software). 
The  program  shall  assure  implementation  of 
requirements  for  configuration  identification, 
control,  status  accounting  and  verification  in 
accordance  with  SSPINST  4 1 30.4 [  1 1 1 . 

3.2.9  SPALT  Management 

The  contractor  shall  establish  and  imple¬ 
ment  a  program  for  monitoring  and  providing 
data  regarding  the  proposal,  development, 
production  and  implementation  of  each  essen¬ 
tial  SPALT  in  accordance  with  SSPINST 
P4720.il  12]. 

3.3  ELEMENTS  OF  A  RELIABILITY  AND 
AVAILABILITY  EVALUATION 
PROGRAM 

The  basic  elements  of  a  reliability  and  avail¬ 
ability  evaluation  program  and  the  flow  of 
associated  activities,  promulgated  in  NAVSEA 
OD  2 1 549 [  121 .  are  shown  in  figure  3-2.  The 
various  elements  are  discussed  in  detail  else¬ 
where  in  this  manual.  The  paragraphs  are 
identified  in  the  figure. 

3.3.1  Reliability  and  Availability  Analysis 

A  major  element  of  an  evaluation  program 
is  reliability  and  availability  analysis.  This 
analysis  consists  of  three  major  functions; 
mission  analysis,  system  analysis,  and  analysis 
of  the  integrated  test  program,  and  is  dis¬ 
cussed  in  Section  4.  It  is  supported  by: 

•  A  data  system  function  responsible  for 
collecting,  controlling,  processing,  and  using 
data  from  tests,  operations  and  maintenance 
activities. 
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Figure  3-1.  Simplified  Corrective  Action  System 
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Figure  3-2.  Flow  of  Activities  in  a  Reliability  and  Availability  Evaluation  Program 
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•  A  failure  analysis  and  classification  func¬ 
tion  responsible  for  analyzing  and  classifying 
each  failure,  providing  corrective  action  and 
closing  out  each  failure, 

•  A  statistical  analysis  function  for  neces¬ 
sary  statistical  support,  including  determining 
confidence  bounds  on  estimated  reliability 
and  availability. 

3.3.2  Data  System 

A  data  system  is  needed  to  support  reliabil¬ 
ity  and  availability  evaluation.  Each  of  the 
three  functions  of  reliability  and  availability 
analysis  (mission  analysis,  system  analysis, 
and  analysis  of  the  integrated  test  plan)  help 
to  generate  evaluation  data  requirements. 
System  analysis  defines  the  hardware  and 
the  software  for  which  data  are  to  be  col¬ 
lected:  mission  analysis  defines  the  applicable 
environments  and  duty  cycles;  analysis  of  the 
integrated  test  program  identifies  tests  rele¬ 
vant  to  reliability  and  availability  evaluation. 

Four  functions  comprise  a  data  system. 
They  are  collection,  control,  processing  and 
utilization.  An  effective  data  system  embody¬ 
ing  each  of  these  must  be  established  during 
the  development  phase.  Its  basic  structure 
should  be  sufficiently  flexible  so  that  with 
minimum  modifications,  principally  to  the 
collection  and  control  functions,  subsequent 
operational  data  from  subsystems  in  fleet 
service  can  be  employed  to  extend  evaluation 
into  the  fleet  use  phase. 

Data  systems  are  discussed  in  detail  in  Sec¬ 
tion  9 

3.3.3  Failure  Analysis  and  Classification 

The  primary  goal  of  failure  analysis  is  reli¬ 
ability  and  availability  improvement.  This 
improvement  is  obtained  through  the  engi¬ 
neering  analysis  of  each  failure  in  order  to 
identify  the  mechanism  and  cause  of  failure 
and  to  recommend  appropriate  corrective 
action 

Failure  analysis  supports  the  reliability  and 
availability  evaluation  process  by  enabling 
proper  classifications  to  be  made  of  each  fail¬ 
ure.  Guidelines  for  failure  classification 
should  be  established  before  testing  begins. 
Various  classifications  are  useful  for  reli¬ 
ability  and  availability  evaluation.  Most  im¬ 
portant  is  that  each  failure  must  be  classified 


as  relevant  or  non-relevant;  only  relevant  fail¬ 
ures  are  counted  in  computing  reliability  and 
availability.  Rules  for  making  this  classifi¬ 
cation  should  be  established.  Other  classifi¬ 
cation  criteria  such  as  severity  (catastrophic, 
critical,  major,  minor)  may  also  impact  reli¬ 
ability  and  availability  measurement  and 
should  be  defined  specifically  for  each  sys¬ 
tem. 

The  failure  analysis  system  is  a  closed-loop 
process,  in  which  all  failures  are  evaluated  to 
determine  the  need  for  corrective  action  and 
each  failure  is  closed  out  by  an  appropriate 
corrective  action. 

3.3.4  Statistical  Analysis 

Reliability  and  availability  parameter  esti¬ 
mates  are  obtained  by  using  appropriate 
models.  This  manual  describes  several  meth¬ 
ods  approved  for  selected  use  with  proper 
application  in  SSPO  programs. 

3.4  IMPLEMENTING  A  RELIABILITY 
AND  AVAILABILITY  EVALUATION 
PROGRAM 

In  support  of  its  decision-making  functions, 
SSPO  requires  reliability  and  availability  eval¬ 
uations  beginning  in  the  development  phase, 
before  the  first  fabrication  of  hardware  and 
software  for  testing  and  continuing  through 
the  system’s  operational  or  fleet  use  phase. 
Implementation  of  a  program  to  meet  these 
needs  is  a  joint  activity  of  the  weapon  system 
manager  and  the  system  contractors.  To  this 
end,  SSPO  weapon  system  management  speci¬ 
fies  top-level  system  reliability  and  avail¬ 
ability  requirements  at  the  onset  of  the  de¬ 
velopment  program.  SSPO  management  also 
stipulates  the  need  for  reliability  and  avail¬ 
ability  evaluation  as  part  of  the  Weapon  Sys¬ 
tem  Requirements  Specification  and  in  com¬ 
panion  documents  such  as  NAVSEA  OD 
2 1  S49{  10} .  In  support  of  the  contractor’s 
analytical  tasks,  as  described  in  Section  4 
of  this  manual,  SSPO  also  provides  informa¬ 
tion  on  the  intended  mission,  system  inter¬ 
faces,  schedules,  and  the  operational  use,  and 
logistic  environments.  Later,  SSPO  manage¬ 
ment  will  also  function  to  review  and  inte¬ 
grate  contractor’s  outputs  for  the  evaluation 
of  the  reliability  and  availability  of  the  com¬ 
plete  weapon  system. 
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The  contractor’s  responsibilities  include  es¬ 
tablishing  reliability  and  availability  evalua¬ 
tion  as  program  activities  by  means  of  policy 
directives  and  designation  of  organizational 
responsibilities  and  authority  for  the  analysis, 
data,  reporting,  and  corrective  action  func¬ 
tions  which  comprise  tire  program.  Figure 
3-3  illustrates  SSPO/contractor  responsibil¬ 
ities  and  relationships  in  a  weapon  system 
evaluation  program. 

To  implement  the  evaluation  program  fora 
particular  system,  the  contractor  must  per¬ 
form  die  mission  and  system  analysis  de¬ 
scribed  herein,  develop  the  necessary  mathe¬ 
matical  models  for  reliability  and  availability, 
and  determine  the  type  and  quantity  of  data 
necessary  to  solve  the  models.  The  con¬ 
tractor  must  analyze  the  integrated  test  pro¬ 
gram  to  determine  the  quantity  and  quality 
of  data  applicable  to  reliability  and  availa¬ 
bility  evaluation.  Procedures,  instructions  and 
forms  must  be  developed  to  collect,  monitor 
and  process  data  from  the  test  and  opera¬ 
tional  sites.  A  manual  system,  computer  pro¬ 
grams  or  a  combination  thereof  for  processing 
the  data  and  generating  tabular  portions  of 
reports  must  also  be  developed.  Most 
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important,  if  the  evaluation  program  is  to  be 
of  benefit  to  the  project,  is  the  need  for  a 
suitable  internal  information  loop  to  assure 
that  contractor  management  and  other  cogni¬ 
zant  project  personnel  are  made  aware  of  the 
current  system  reliability  and  availability, 
particularly  system  elements  that  may  re¬ 
quire  corrective  action  to  achieve  satisfactory 
reliability  or  availability.  Timeliness  is  of  the 
essence. 

Most  contractors  have  company  policies, 
procedures,  and  operating  instructions  which 
cover  reliability  and  availability  in  detail 
These  should  be  reviewed  against  the  require¬ 
ments  for  the  particular  system,  and  the  cur¬ 
rent  contractor  organization  structure,  to  as¬ 
sure  that  they  are  complete  and  up-to-date. 

With  the  beginning  of  testing,  the  data  sys¬ 
tem  is  implemented  to  collect,  control,  pro¬ 
cess  and  utilize  data  to  measure  reliability  and 
availability.  Status  reports  are  issued  at  inter¬ 
vals  as  required  by  contract.  The  information 
loop  is  closed  by  feedback  of  evaluation  re¬ 
sults,  through  the  contractor's  management 
structure,  to  the  engineering,  production  and 
assurance  activities. 


7.  NAVMAT  09H  Guideline  Policy  #1: 
Specification  of  Reliability  and  Main¬ 
tainability  Requirements  in  Navy  Con¬ 
tracts,  1973. 

8.  NAVMAT  09H  Guideline  Policy  #2: 
Specification  of  Reliability  and  Main¬ 
tainability  Requirements  in  Navy  Con¬ 
tracts,  1974. 

9.  NAVMAT  09H  Guideline  Policy  #3: 
Generation  of  Mission  Profiles,  1974. 

10.  NAVSEA  OD  21549A:  Technical  Pro¬ 
gram  Management  Requirements  for 
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on  System  Equipment. 
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Section  4 

RELIABILITY  AND  AVAILABILITY  ANALYSIS 


I  h ret  ty  pcs  of  analysis  arc  discussed  in  this 
section,  namely  .  mission  analysis,  system  anal¬ 
ysis.  and  analysis  of  the  integrated  test  pro¬ 
gram. 

4  1  MISSION  ANALYSIS 

Mission  analysis  is  a  process  for  defining 
with  precision  the  mission  for  which  avail¬ 
ability  and  reliability  are  to  be  evaluated. 
Mission  analysis  identifies  hardware  and  soft¬ 
ware  performance  functions,  success  criteria, 
duty  cycles,  environmental  stress  levels,  and 
time  ot  exposure  to  each  environment.  The 
TOG  is  used  as  the  baseline  for  the  mission 
analysis.  The  results  of  the  mission  analysis 
are  used  in  the  development  of  design  spec¬ 
ifications  for  the  subsystem  and  lower  level 
elements.  Mission  analysis  for  reliability  and 
availability  evaluation  includes  the  definition 
of  mission  phases  and  the  development  of 
mission  (environmental  stress  and  duty  cycle) 
profiles. 

4.1.1  Definition  of  Mission  Phases 

The  mission  is  represented  as  a  sequence  of 
events.  When  a  range  of  alternative  missions  is 
possible,  each  is  examined  as  a  distinct  event 
sequence.  Each  total  mission  is  then  separated 
into  phases.  For  systems  such  as  FBMWS/SWS 
four  phases  are  normally  defined: 

a.  Logistic  phase  (including  transportation, 
handling,  storage,  processing,  refit,  and  test). 

b.  Operational  readiness  phase. 

c.  Launch  phase. 

d.  Flight  phase. 

A  reliability  figure  of  merit  is  usually  ap¬ 
propriate  to  any  mission  phase  that  begins 
with  a  demand  on  the  system  or  device  under 
analysis,  such  that  the  mission  cannot  be  suc¬ 
cessful  if  failure  occurs  at  any  time  during  the 


phase.  Normally,  the  launch  and  flight  phases 
are  in  this  category.  An  availability  figure  ot 
merit  is  appropriate  to  any  mission  phase  in 
which  success  can  follow  failure  and  rep.m 
The  operational  readiness  phase  is  m  tins 
category 

The  general  sequence  of  mission  pluses  ;  >r 
a  FBMWS  SWS  patrol  is  shown  in  figures  4-i 
and  4-2.  (It  should  be  noted  that  the  legist u 
phase  occurs  before  and  after  a  patrolj. 

The  weapon  system’s  tactical  mission  is 
defined  for  a  patrol  period,  tr .  which  in  the 
absence  of  a  demand  for  launching  of  missiles 
is  of  nominal  duration,  T.  For  this  mission 
the  entire  patrol  is  the  operational  readi¬ 
ness  phase,  (Figure  4-1  A).  An  availability 
figure  of  merit  is  appropriate  in  this  case. 

Should  a  demand  be  made  on  the  weapon 
system,  that  demand  terminates  the  opera¬ 
tional  readiness  phase  and  initiates  the  ensu¬ 
ing  launch  phase  which  may  include  a  hold 
period  at  time  t,  .  and  flight  phase  at  time  t, 
(Figure  4-1 B). 

A  successful  mission  precludes  failure  of 
any  on-line,  non-redundant  system  element 
during  the  launch  phase,  regardless  of  whether 
the  element  is  repairable.  This  is  a  valid  con¬ 
straint  for  purposes  of  hardware  evaluation, 
even  though  it  may  be  possible  in  a  tactical 
situation  for  the  submarine’s  crew  to  effect 
one  or  more  missile  launches  despite  certain 
equipment  failures.  Thus,  the  figure  of  merit 
during  the  launch  phase  (and  of  flight  hard¬ 
ware  and  software  in  the  flight  phase)  is 
reliability  or  probability  of  failure-free  opera¬ 
tion. 

In  the  development  of  the  mission  profiles 
for  each  pha>e,  the  Analysis  for  Design  par¬ 
agraph  of  NAVSEA  OD  21549(11  requires 
that  the  most  severe  design  constraints  be 
identified. 
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Figure  4-1B.  Patrol  With  Launch 


Figure  4-1 .  Definition  of  Tactical  Mission  Phases 
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4.1 .2  Development  of  Mission  Profiles 

A  mission  profile  is  developed  for  each 
phase  defined.  Two  kinds  of  information  are 
developed  in  the  preparation  of  a  mission  pro¬ 
file;  an  environmental  profile  containing  the 
level  and  duration  of  exposure  to  each  appli¬ 
cable  environment  stress  and  the  related  duty 
cycle  profile  (i.e.,  whether  the  device  is  opera¬ 
ting,  non-operating,  or  cycling)  for  the  hard¬ 
ware  or  software. 

To  prepare  a  mission  profile  the  analyst 
lists  the  operational  modes  of  the  system  in 
each  mission  phase.  Performance  functions 
required  by  each  mode  in  each  phase  are  then 
listed  and  associated  with  the  hardware  and 
software  necessary  to  accomplish  them.  A 
form  having  the  general  information  content 
illustrated  in  figure  4-3  is  helpful  in  organizing 
this  portion  of  the  analysis.  In  general,  not 
all  of  a  system’s  functions  will  be  equally  im¬ 
portant  to  the  mission.  Thus  it  is  necessary  to 
define  the  minimum  limits  of  successful  per¬ 
formance  for  purposes  of  reliability  and 
availability  analysis.  This  is  accomplished  by 
listing  that  subset  of  the  performance  func¬ 
tions  that  is  essential  to  the  primary  mission. 

Performance  times  required  for  each  of  the 
essential  functions  are  then  listed.  Where  per¬ 
formance  times  are  random  variables,  their 
maximum  values  should  be  used  to  ensure  a 
conservative  approach.  Environmental  levels 
that  depart  significantly  from  room  ambient 
are  then  listed,  together  with  maximum  times 


of  system  exposure  to  those  environments  for 
each  operational  hardware  state  (A.  B.  C,  D) 
as  defined  below. 

Reliability  models  can  sometimes  be  sim¬ 
plified  considerably  by  adopting  the  mission 
as  a  common  unit  of  time  When  this  is  done 
the  term  T  representing  mission  length  be¬ 
comes  unity  and  drops  from  the  reliability 
equation,  failure  rates  are  expressed  in  failures 
per  mission,  the  reliability  under  evaluation  is 
R(T)  =  e~Jo  h(l>  at  or  R(  1 )  =  e~ fo  h,"‘"  and 
for  the  exponential  case  this  becomes  K<1  I  - 
e-x  '  and  R(  1 )  =  e'x  and  need  not  be  treated 
as  a  time  function.  All  elements  of  the  s\  ,u m 
are  analyzed  in  terms  of  their  defined  mission. 
This  normalization  simplifies  the  task  of  com¬ 
bining  element  reliabilities  in  system  models 

The  coefficient  «  is  used  to  convert  test 
time  from  units  of  minutes  or  cycles  to  units 
of  equivalent  missions.  For  a  given  hardware 
item,  alpha  is  a  reciprocal  of  mission  exposure 
time,  either  environmental  or  operational  Its 
dimensions  are  missions  per  minute  or  cycle, 
whichever  is  appropriate  to  the  device.  Figure 
4-4  is  an  example  of  a  table  of  a  coefficients 
for  hardware  elements  of  a  system.  The 
following  information  is  also  tabulated  for 
each  component: 

Environmental  Mission  environmental 
stresses  experienced  by  the  component 
Operating  life  is  considered  as  a  separate 
environment. 

Hardware  State  -  Four  hardware  states  are 
defined  as  listed  below. 


Phase  Duration: 


Mission  Phase.  Space  or  Distance: 


System 

Mode 

Function 

Related 

Hardware 

and 

Software 

Function 
and  time 
Duration 

Success 

Criteria 

Duty 

Cycle 

Environment 
and  Time 
Duration 

Figure  4-3.  Information  Required  for  the  Development  of  Mission  Profiles 
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State  A  -  Non-operating  but  must 
survive  and  be  operational  in  a  later 
mission  phase. 

State  B  Non-operating  but  must  not 
operate  prematurely  and  must  be 
operational  in  a  later  mission  phase. 

State  C  -  Operating.  Duration  count¬ 
able  in  cycles  or  discrete  events. 

State  D  -  Operating.  Duration  mea¬ 
surable  in  units  of  time. 

4.1 .2.1  The  Environmental  Profile 

The  environmental  profile  is  an  important 
element  of  the  mission  profile.  It  lists  antic¬ 
ipated  exposure  of  each  hardware  item  to 
environmental  stresses  (e.g.,  temperature,  vi¬ 
bration,  shock,  acceleration,  radiation)  during 
each  mission  phase. 

An  environmental  profile  includes  as  a  min¬ 
imum  the  time  duration  of  each  mission  phase 
and  the  level  of  each  environment  imposed  on 
the  item  during  each  mission  phase. 

Environmental  Profile  information  is  re¬ 
quired  for  the  development  of  hardware  spec¬ 
ifications  and  the  planning  of  test  activities  to 
ensure  that  hardware  is  designed  to  survive 
the  environments  that  will  be  encountered  in 
all  mission  phases  including  transportation, 
storage,  handling,  mating  and  checkout,  as 
well  as  in  use,  and  that  the  ability  to  with¬ 
stand  all  environments  is  verified  in  the  in¬ 
tegrated  test  program  by  test,  analysis,  or 
both. 

Environmental  Profiles  are  not  required  for 
software.  Software  consists  of  instructions, 
independent  of  the  storage  medium,  and  is 
therefore  not  directly  affected  by  the  physical 
environment.  Software  may  be  stored  on  mag¬ 
netic  tapes  or  disks,  paper  tapes  or  cards,  or 
in  a  core  memory.  The  hardware  in  which 
software  is  stored  is  often  affected  by  its 
physical  environment  (e.g.,  magnetic  or  elec¬ 
trostatic  fields).  A  hostile  environment  may 
erase  all  or  part  of  a  correct  software  program 
if  the  storage  medium  is  not  properly  protect¬ 
ed,  but  this  is  considered  a  hardware  problem. 
The  physical  environment  may  indirectly  af¬ 
fect  software  performance  in  other  ways.  The 
response  of  input  sensors  to  environmental 
variables  may  affect  the  logical  path  taken 
through  a  computer  program,  thereby  reveal¬ 
ing  a  software  error  not  present  in  other 


paths.  In  this  case  also,  the  environment  has 
not  actually  affected  the  software’s  reliability; 
the  defect  was  always  present,  though  un¬ 
revealed. 

4.1 .2.2  The  Duty  Cycle  Profile 

The  duty  cycle  profile  defines  the  state 
(operating,  non-operating  or  cycling)  of  each 
item  in  a  system  during  each  mission  phase.  It 
includes  as  a  minimum;  1)  the  time  duration, 
distance,  number  of  cycles,  etc.  of  each  mis¬ 
sion  phase,  2)  a  description  of  what  each  item 
must  do  during  the  mission  phase  including  its 
success  or  failure  criteria,  3)  anticipated  total 
time,  cycles,  etc.  in  each  state  (operating, 
non-operating,  cycling)  during  each  mission 
phase.  Exposure  time  is  normally  measured  in 
minutes  for  operating  and  non-operating 
states.  The  total  number  of  cycles  occurring 
during  the  mission  phase,  is  used  for  the 
cycling  state.  A  duty  cycle  profile  is  of  less 
importance  during  the  logistic  phase,  when 
most  equipment  is  not  operating,  however  it 
must  be  considered  when  applicable. 


4. 1.2.3  Alpha  Values 

The  alpha  value  for  each  environment  is  the 
reciprocal  of  the  time  (in  minutes,  cycles  or 
other  appropriate  units)  during  which  the  sys¬ 
tem  will  be  exposed  to  the  environment 
during  the  mission.  Its  purpose  is  to  convert 
failures  per  unit  time  or  cycle  to  failures  per 
mission.  The  alpha  values  normalize  data  to 
failures  per  mission  permitting  them  to  be 
combined  conveniently. 

Example:  An  equipment  consists  of  three 
serially  related  components;  it  operates  in  a 
benign  environment  and  has  a  one-hour 
mission. 

The  duty  cycles  of  the  components  are: 
Component  A  operates  one  hour 

Component  B  operates  40  minutes 

Component  C  operates  20  minutes 

Alpha  values  for  the  components  are  1/60, 
1/40.  1/20  respectively. 

Now  if  1,000  hours  of  equipment  level 
testing  and  1,000  hours  of  component  testing 
are  accumulated  on  each  component,  total 
accumulated  test  time  on  each  component  in 
equivalent  missions  is. 
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Component 

From 

Equipment 

Test 
( Fquiv 
Missions) 

From  Component  Test 

Total 

Equiv 

Missions 

Minutes 

Alpha 

Values 

Equiv 

Missions 

A 

1 .000 

60.000 

1/60 

1,000 

2,000 

B 

1 .000 

60.000 

1/40 

1,500 

2,500 

0 

1,000 

60.000 

1/20 

3,000 

4,000 

The  need  to  normalize  to  equivalent  mis¬ 
sions  as  a  time  base  stems  from  the  desire  to 
use  component-level  test  data  in  conjunction 
with  equipment-level  test  data.  A  contractor 
using  only  equipment  level  tests  would  not 
need  to  normalize  to  equivalent  missions. 

The  failures  charged  to  each  component 
.an  be  divided  by  the  component  time  in  mis¬ 
sions;  the  sum  of  these  failure  rates  is  the 
equipment  failure  rate  in  failures  per  mission. 

Non-operating  failure  rates  can  be  added  if 
they  are  significant.  In  benign  environments 
non-operating  failure  rates  are  usually  neg¬ 
ligible.  Ti.e  non-operating  alpha  values  for 
components  B  and  C  would  be  1/20  and  1/40 
respectively,  since  the  non-operating  times 
during  each  sixty  minute  mission  are  twenty 
minutes  for  component  B  and  forty  minutes 
for  component  C.  Since  component  A  oper¬ 
ates  continuously,  its  non-operating  failure 
rate  would  not  enter  the  calculations. 

Alpha  values  are  also  needed  when  com¬ 
bined  environments  are  experienced  in  a  mis¬ 
sion.  In  these  cases,  test  data  are  normalized 
to  equivalent  missions  for  each  environment 
separately  For  environments  such  as  tem¬ 
perature  and  vibration,  both  non-operating 
and  operating  failure  rates  and  alpha  values 
are  appropriate,  since  test  data  can  provide 
appropriate  estimates  of  the  failure  rates. 
Caution:  It  is  always  necessary  to  relate  the 
test  environment  to  the  mission  environment 
fe.g.,  §  4. 1.2.4). 

4. 1.2.4  Acceleration  Factors 

Normal  practice  in  the  analysis  of  an  inte¬ 
grated  test  program  is  to  accept  environmen¬ 
tal  test  data  as  useful  for  reliability  evaluation 


if  and  only  if  the  test  environment  is  essen¬ 
tially  the  same  as  the  mission  environment 
This  must  usually  be  evaluated  on  an  environ¬ 
ment  by  environment  basis,  because  the  abil¬ 
ity  to  accurately  duplicate  the  combined 
environmental  effects  that  occur  in  most  mis¬ 
sions  does  not  exist  in  many  test  facilities. 

In  some  cases  contractors  may  plan  to  run 
accelerated  tests.  Accelerated  test  data  can  be 
used  for  reliability  evaluation  when  the  con¬ 
tractor  has  data  to  relate  the  failure  rate  in 
test  to  the  failure  rate  in  the  mission.  Such  a 
relationship  is  often  expressed  by  a  k-factor 
or  acceleration  factor.  For  example,  if  one 
hour  of  test  in  some  accelerated  environment 
is  known  to  be  equivalent  to  two  hours  of 
mission  exposure,  the  acceleration  factor,  k, 
would  be  2.  The  test  time  in  equivalent  mis¬ 
sions  is  then  kat.  This  procedure  is  somewhat 
similar  to  the  use  of  environmental  factors 
(flE)  in  M1L-HDBK-2 1 7[  2]  predictions. 

Conceptually,  k-factors  can  take  on  values 
less  than  one,  but  the  purpose  of  accelerated 
testing  is  to  reduce  test  time,  thus  k-factors 
are  normally  greater  than  one.  MIL-HDBK- 
2 1 7 [ 2 ]  cautions  that  extrapolation  of  envi¬ 
ronmental  modifiers  is  completely  invalid.  A 
similar  caution  applies  to  the  use  of  k-factors. 
The  contractor  must  have  data  to  justify  the 
use  of  a  k-factor  used  in  reliability  evaluation 
or  demonstration. 

4.2  SYSTEM  ANALYSIS 

System  analysis  is  a  study  of  the  means  by 
which  the  hardware  and  software  that  com¬ 
prise  a  system  are  able  to  respond  to  the  de¬ 
mands  of  the  mission.  As  used  in  this  manual, 
the  term  encompasses  seven  activities:  1) 
listing  the  system  configuration  and  perform¬ 
ance  functions,  2)  reliability  and  availability 
modeling,  3)  reliability  and  availability  ap¬ 
portionment,  4)  reliability  and  availability 
prediction,  5)  failure  mode,  effects  and  crit¬ 
icality  analysis,  6)  fault  tree  analysis,  and  7) 
development  of  measurement  data  require¬ 
ments. 


4-6 


NAVSEAOD  29304B 


4.2,1  Listing  System  Configuration  and 
Performance  Functions 

Configuration  of  the  system’s  hardware  can 
be  defined  by  a  hardware  list  indentured  by 
assembly  level.  Various  assembly  level  cat¬ 
egories  are  commonly  used.  A  typical  set  of 
categories  is: 

System 

Subsystem 

Equipment 

Component 

Module 

Part 

While  hardware  listing  is  the  simpler  ap¬ 
proach,  configuration  can  also  be  defined  by 
tabulating  the  hardware  required  to  realize 
system  functions.  A  functional  listing  is  of 
particular  value  when  a  system  is  complex, 
multifunctional  or  has  many  interfaces  with 
other  systems.  Both  approaches  are  illustrated 
in  figure  4-5. 

The  configuration  of  a  software  system  can 
also  be  defined  as  an  indentured  list.  When 
software  development  is  done  under  struc¬ 
tured  programming,  self-contained  programs, 
subprograms  and  routines,  each  separately 
compilable  and  independently  testable,  are 
programmed  to  perform  one  or  more  well- 
defined  functional  tasks.  But  it  is  usually 
simpler  to  list  software  elements  by  function. 
Usually  a  list  of  modules  can  be  directly  ex¬ 
tracted  from  a  functional  diagram  of  a  soft¬ 
ware  system.  Figure  4-6  is  a  simplified  func¬ 
tional  diagram  of  a  land-based  radar  software 
system  that  filters  raw  radar  data  to  predict 
the  trajectory  of  a  tracked  missile.  This  infor¬ 
mation  can  be  used  to  keep  the  radar  “on 
track”.  The  components  of  the  software  sys¬ 
tem  are  the  executive  program,  the  Kalman 
filter,  coordinate  transformation,  predicted 
trajectory  subprograms,  and  integration  and 
matrix  inversion  routines  which  support  the 
Kalman  filter.  A  software  system  is  composed 
of  programming  instructions.  Each  indentured 
element  must  be  given  an  identifying  number 
and  subsequent  changes  must  be  closely  con¬ 
trolled  by  a  configuration  management  sys¬ 
tem.  The  media  on  which  the  instructions  are 
stored  are  part  of  the  computer  hardware. 


Function 

Component 

A 

Vehicle  Control 

Al  Command  Link 

11 

Receiver 

4 

Transactor 

8 

Programmer 

9 

Baroswikh 

AC  Tracking 

13 

Pulse  Generator 

14 

Flashing  Light 

15 

Beacon 

A3  Vehicle  Operation 

1 

Vehicle  Controller 

2 

Propulsion 

A4  Power  Supply 

3 

Batteries 

B 

Payload  Operation 

10 

TV  Camera 

C 

Monitoring 

5 

Sensors 

6 

Multicoder 

7 

Amplifier 

\2 

Transmitter 

COMPONLNT  LISTtD  BV  MISSION  FUNCTION 


Equipment 

Component 

Vehicle 

1 

Vehicle  Controller 

2 

Propulsion 

3 

Batteries 

Telemetry  and 

4 

Translator 

Command 

5 

Sensors 

6 

Multicoder 

7 

Amplifier 

8 

Programmer 

9 

Baroswitch 

Payload 

10 

TV  Camera 

Communications 

1  1 

Receiver 

i: 

Transmitter 

13 

Pulse  Generator 

14 

Flashing  Light 

15 

Beau  on 

COMPONI  MS  LIST!  I)  B>  I  Ql’IPMFNT  BLOCKS 

Figure  4-5.  Component  Listings 


Figure  4-6.  Trajectory  Prediction  Software  System 
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4.2.2  Reliability  and  Availability  Modeling 

4.3.2. 1  Reliability  Modeling 

A  reliability  model  represents  the  manner 
in  which  the  reliability  of  a  system  depends 
on  the  reliability  of  the  system’s  constituent 
elements  \  reliability  model  consists  of  a 
reliability  block  diagram  and  one  or  more 
mathematical  equations. 

4. 2. 2. 1.1  Reliability  Block  Diagram 

A  reliability  block  diagram  is  a  logic  di¬ 
agram.  a  graphic  analog  of  logical  events  that 
result  in  success  or  failure  of  the  system.  In 
this  sense  it  differs  from  a  functional  block 
diagram,  which  is  a  schematic  representation 
of  a  system.  (Note:  The  block  diagrams  used 
in  a  FMEC'A  are  functional  diagrams.) 

Figure  4-7  is  a  reliability  block  diagram  of  a 
simple  subsystem  containing  three  equip¬ 
ments.  all  of  which  must  operate  for  sub¬ 
system  success. 


Figure  4-7.  Serial  Subsystem 

The  logical  expression  for  subsystem  suc¬ 
cess  conveyed  by  the  Mock  diagram  is 

S  =  AHBnC,  where 

S  represents  the  event  subsystem  success,  and 
the  intersection  symbol,  ft,  is  interpreted  as 
“and”.  Also  the  reliability  or  probability  of 
subsystem  success,  P<  S),  is  given  by 

P(S)  =  P(AnBnc) 

If  failures  of  A,  B.  and  C  are  statistically  in¬ 
dependent.  P(S)  can  be  written 

P(S)  =  P( A)  •  P(B)  •  P(C) 


where  P(A),  P(B),  and  P(C)  are  the  probabil¬ 
ities  of  the  events  A,  B,  and  C  respectively. 

A  reliability  block  diagram  models  redun¬ 
dancy  in  terms  of  parallel  paths.  Figure  4-8 
shows  a  subsystem  in  which  equipments  A 
and  B  and  either  C  or  D  must  operate  for  the 
subsystem  to  be  successful. 

The  logical  expression  for  subsystem  suc¬ 
cess  is 

S  =  AnBn(CUD).  where 

the  union  symbol,  U,  is  interpreted  as  “or” 
and  subsystem  reliability  is 

P(S)  =  P  [AHBn  (CUD)] 

Again,  given  equipment  independence, 

P(S)=  P(A)  •  P(B)  •  ]P(C) 

+  P(D)-P(C)  •  P(D) ] 

Reliability  block  diagrams  can  be  prepared 
for  many  systems  using  only  simple  series- 
parallel  combinations.  But  more  complex 
logical  structures  are  often  encountered. 
Many  systems  contain  switchable  standby 
elements  that  are  not  activated  until  one  or 
more  primary  elements  fail.  Redundant  con¬ 
figurations  of  the  m-out-of-n  type  [Appendix 
E]  are  also  common,  as  are  majority  voting 
schemes  and  similar  arrangements.  A  single 
component  may  be  used  to  back  up  two  or 
more  parallel  redundant  elements,  or  multiple 
standby  elements  can  be  used  to  back-up  a 
single  primary'  element.  The  possible  configu¬ 
rations  are  limitless. 

It  is  possible  that  a  fan  or  a  battery  which 
is  not  part  of  the  subsystem  could  be  required 
for  cooling  or  graceful  degradation,  respec¬ 
tively,  in  either  figure  4-7  or  4-8.  It  is  impor¬ 
tant  that  the  model  reflect  this  dependency 
when  it  exists.  The  failure  mode,  effects,  and 
criticality  analysis  (FMECA)  described  in 
§  4.2.5  provides  a  useful  input  to  modeling 
for  dependencies  of  this  nature. 

Usage  rules  also  affect  the  structure  of  a 
reliability  block  diagram.  For  example,  a  sys¬ 
tem  may  contain  three  identical  equipments, 
subjected  during  the  mission  to  varying  levels 
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Figure  4-8.  Series-Parallel  Subsystem 


of  functional  loading.  Suppose  that  the  mis¬ 
sion  consists  of  three  phases  and  that  the  load 
profile  demands  that  at  least  one  equipment 
must  operate  during  the  first  phase,  at  least 
two  of  the  three  must  operate  during  the 
second  phase,  and  all  three  equipments  must 
operate  during  the  final  phase.  Then  the  re¬ 
liability  block  diagram  will  be  radically  dif¬ 
ferent  for  each  phase,  although  the  equipment 
configuration  will  not  change  (Figure  4-9). 
Further,  it  is  possible  for  such  a  system  to 
complete  one  phase  reliably,  yet  be  unavail¬ 
able  to  begin  the  next. 

In  choosing  among  various  block  diagram 
formats  it  should  be  remembered  that  a  block 
diagram  should  provide  quick  and  easy  insight 
into  the  logical  relationships  that  determine 
the  success  or  failure  of  the  system  being 
modeled.  Figures  4-10  and  4-1  1  show  two 
block  diagrams  of  a  typical  trip  function. 
Most  would  agree  that  the  diagram  in  figure 
4-1 1  is  more  quickly  and  easily  understood 
than  its  equivalent  shown  in  figure  4-10. 

4.2.2. 1 .2  Mathematical  Models  -  Reliability 

A  mathematical  model  is  an  algebraic 
analog  of  the  block  diagram.  It  is  prepared  by 
review  of  the  block  diagram.  In  developing 
reliability  models  of  the  attribute  type,  three 
assumptions  are  usually  applied  to  elements 
of  the  system:  I)  only  two  element  states  are 
recognized  operable  or  failed,  2)  repair  is 
not  considered  ,  a  failed  element  is  considered 
to  remain  failed  for  the  duration  of  the  mis¬ 
sion,  3)  elements  are  statistically  independent; 
failure  of  a  given  element  does  not  affect  the 
probability  of  failure  of  any  other  element. 


PHASE  1 


PHASE  2 


PHASE  3 


Figure  4-9.  Reliability  Model  by  Mission  Phases 
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START 


TRIP 

Figure  410,  Reliability  Block  Diagram 

Circuit  Breaker  Trip  Function 
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Figure  4-11.  Alternative  Reliability  Block  Diagram  - 
Circuit  Breaker  Trip  Function 


Under  these  assumptions  RA  +  QA  =  1 , 
where  RA  is  the  reliability  of  element  A,  the 
event  probability,  P(A),  and  QA  is  the  unreli¬ 
ability  of  element  A,  the  event  probability, 
P(A).  If  an  element  is  used  N  times,  (R  +  Q)N 

N 

=  1  and  if  N  different  items  are  used,  n  ( R, 
+  Qj)  =  1-  “ 

4.2.2. 1.2.1  Directly  Written  Models 

Where  the  block  diagram  reflects  simple 
series-parallel  logic,  a  reliability  model  can  be 


written  using  well  known  combinatorial  equa¬ 
tions. 

For  series  related  elements. 

Rs  =  Ra  •  rb  •  Rt  •  '  '  =  11  R,  (4-') 

all  i 

and  for  parallel  redundant  elements  where 
one  must  operate. 

Rs  =  1  -|(1-RA)(1-RB)(1  -R, .)•  •  *]  (4-2) 
=  1-11  ( 1 — R, ) 

•n  i 
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If  the  elements  are  identical  and  there  are  n  of 
them,  equations  (4-1)  and  (4-2)  reduce  respec¬ 
tively  to 

Rs  =  Rn  (4-3) 

and 

Rs  =  1  -  ( 1  -  R)n  (4-4) 

For  n  identical  parallel  redundant  elements, 
any  m  of  which  are  required  to  operate, 

Rs  =  £  Rx  (1  -  R)n  x  (4-5) 

x - m  \X / 

where 

/  n\  _  nl  0!  =  1 
V  x  /  x  !( n-x )!  n!  =  (n)  (n- 1 )  •  *  •  ( 1 ). 

The  phase  models  corresponding  to  figure 
4-9  can  easily  be  written  for  phases  1  and  3 
using  equations  (4-2)  and  (4-1)  respectively. 


Rs  =  R,  Ru  Rr 

3)  cut 

But  none  of  the  equations  above  are  adequate 
for  phase  2,  although  equations  (4-5)  would 
solve  phase  2  if  all  the  elements  were  iden¬ 
tical.  The  equation  for  phase  2  is  developed 
using  Binomial  Modeling  concepts  in 
§  4. 2. 2. 1.2. 2  and  is  presented  in  figure  4-12. 

4.2.2. 1 .2.2  Binomial  Modeling 

Models  involving  two-state  components  can 
always  be  written  using  the  binomial  relation¬ 
ship  R  +  Q  =  1 ,  but  the  process  becomes 
tedious  as  the  number  of  components  in¬ 
creases.  A  system  comprised  of  n  two-state 
elements  can  assume  any  of  2"  unique  states. 
Some  of  them  correspond  to  system  success, 
others  to  system  failure. 

It  is  assumed  that  every  component  begins 
its  mission  in  an  operable  condition  and  can 
complete  the  mission  in  either  of  two  states- 
operable  or  failed.  The  probability  that  it  will 
complete  the  mission  in  an  operable  state  is 
its  reliability  R,  the  probability  that  it  will 
complete  the  mission  in  a  failed  state  is 
1-R  =  0  If  a  system  consists  of  two  com¬ 
ponents,  the  system  can  complete  its  mission 


in  any  one  of  four  states.  If  the  components 
are  statistically  independent,  the  state  prob¬ 
abilities  can  be  enumerated  by  a  simple  ex¬ 
pansion. 

(R,+Q,)(R2+Q2)=  R,R2  +R)02 

+RjQ,  +0,02=  1 

The  four  terms  of  the  expansion  represent 
respectively  the  joint  probabilities  that  both 
components  will  be  successful,  that  com¬ 
ponent  1  will  be  successful  and  component  2 
will  fail,  that  component  2  will  be  successful 
and  component  1  will  fail,  and  that  both  com¬ 
ponents  will  fail.  Because  the  four  states  col¬ 
lectively  exhaust  all  possibilities,  the  probabil¬ 
ities  add  to  unity. 

If  the  components  are  serially  related  in 
the  system,  only  the  full  success  state  repre¬ 
sents  system  success  and  the  first  term  of  the 
expansion  is  the  system  reliability  model. 
However,  if  the  components  are  related  in 
active  parallel  redundancy,  three  of  the  four 
states  represent  system  success  and  only  the 
total  failure  state  corresponds  to  system 
failure.  In  this  case  the  first  three  terms  of  the 
expression  form  the  system’s  reliability 
model. 

It  can  readily  be  seen  that  a  system  of  three 
components  can  complete  a  mission  in  any 
one  of  eight  possible  states.  Dropping  sub¬ 
scripts  for  the  sake  of  notational  brevity  gives 

(R+Q)3  =  R3  +  3R2  0  +  3RQ2  +  Q3  =  1 

The  respective  terms  represent  the  probabil¬ 
ities  of  the  one  state  in  which  all  three  com¬ 
ponents  complete  the  mission  successfully, 
three  states  in  which  two  components  are 
successful  while  one  component  fails,  three 
states  in  which  one  component  is  successful 
while  two  components  fail,  and  one  state  in 
which  all  three  components  fail.  Again,  sys¬ 
tem  reliability  depends  on  how  many  of  these 
eight  possible  states  correspond  to  a  success¬ 
ful  mission. 

For  example,  in  the  three  component  sys¬ 
tem  described  in  figure  4-9  Phase  2,  the  pos¬ 
sible  system  states  are  shown  in  figure  4-12, 
four  of  them  represent  system  success.  In  this 
example  two  of  the  three  components  are 
required  for  system  success.  Therefore,  all 
states  with  two  or  more  S's  represent  system 


\ 
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State 

Component 

System 

No. 

Description 

C 

D 

F. 

1 

1  way  - 
3  succeed 

S 

S 

S 

S 

2 

3  ways  - 

1 

S 

S 

S 

3 

2  succeed 

s 

F 

s 

S 

4 

and  1  fail 

s 

S 

F 

s 

5 

3  ways- 

F 

F 

S 

F 

6 

1  succeeds 

F 

S 

F 

F 

7 

and  2  fail 

S 

F 

F 

F 

8 

1  way  - 
3  fail 

F 

F 

F 

F 

Figure  4-12.  List  of  Possible  States  for  the  Three  Component  System  Depicted  in  Figure  4-9,  Phase  2 


success;  those  with  two  or  more  F’s  represent 
system  failure.  The  reliability  model  for 
figure  4-9  Phase  2  can  now  be  written  by  sum¬ 
ming  the  four  system  success  states  or  by  sub¬ 
tracting  the  sum  of  the  system’s  four  failed 
states  from  unity.  Using  the  four  success 
states  we  have; 

Rs  =  Rc  Rd  Re  +  (1'R-c  ^Rd  Rf. 

+Rc(  I-Rd  )Re  +  RcRd(1-Re) 

4. 2. 2. 1.2. 3  Conditional  Probability  Modeling 

It  is  often  convenient  to  simplify  models 
that  lack  the  simple  series-parallel  structure 
by  using  the  relationship: 

Rs  =  P(S1A)  •  P( A)  +  P(SIA)  •  P(A)  (4-6) 

This  statement  indicates  that  the  reliability 
of  the  system  is  the  probability  that  the  sys¬ 
tem  works  given  that  A  works,  times  the 
probability  that  A  works,  plus  the  probability 
that  the  system  works  given  that  A  fails,  times 
the  probability  that  A  fails.  A  can  be  an  el¬ 
ement  or  group  of  elements  in  the  system. 
For  example,  in  figure  4- 13,  given  that  A  does 
not  fail,  the  system  succeeds  if  either  D  or  E 
does  not  fail. 

P(S|A)P(A)=  IMl-RD)(l-Rt)lRA 


Figure  4-13.  System  Model 

Given  that  A  fails,  the  system  can  only  suc¬ 
ceed  if  either  B  and  D  or  C  and  E  do  not  fail. 

P(S|A)P(A)  =  [1-(1-RbRdM1-RcRk)]  (I-Ra) 

Thus  the  system  reliability  model  is  the  sum 
of  both  expressions. 

Rs  =  1  1-(1-Rd)  ( 1-Rt )1  Ra 

+  ( !-(  1-RbRd  )  ( )-Rc  Rt ))  (1-Ra) 

MIL-HDBK-217|21  discusses  system  reliabil¬ 
ity  modeling. 

4.2.2. 1 .2.4  Models  by  Minimum  Cuts  Method 

Generally  it  is  easy  to  write  directly  the 
reliability  model  for  a  system  of  simple  con¬ 
figuration  intended  for  a  single-phase  mission. 
But  when  the  system  lacks  the  series-parallel 
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structure  or  the  mission  consists  of  multiple 
phases,  the  model  may  be  much  more  dif¬ 
ficult  to  wnte  and  to  evaluate.  It  may  not  be 
sufficient  to  solve  such  a  model  for  each 
phase  independently  of  the  other  phases, 
since  the  conditional  reliability  thus  found  is 
the  probability  of  success  in  a  particular 
phase,  given  that  all  components  of  the  sys¬ 
tem  are  operable  at  the  beginning  of  the 
phase  For  redundant  systems,  this  condition 
may  not  be  met  as  it  is  possible  to  enter  any 
but  the  first  phase  with  some  elements  failed 
but  the  system  operable.  Under  these  condi¬ 
tions  a  system  may  complete  its  current  phase 
reliably,  but  be  unavailable  to  begin  the  next 
phase  What  is  sought  is  the  probability  of 
system  success  in  any  phase,  conditioned  on 
success  in  the  preceding  phase,  rather  than 
conditioned  on  all  elements  beginning  the 
current  phase  unfuiled.  The  method  of  min¬ 
imum  cuts  [3,  4  pages  136-139,  5  pages 
329-3381  is  a  powerful  analytical  tool  for 
treating  this  class  of  problems.  Dr.  C.  Persels’ 
approach  [3]  is  followed  in  this  manual. 

A  cut  is  defined  as  a  group  of  components 
which,  if  all  fail,  will  fail  the  system.  A  mini¬ 
mum  cut  is  a  cut  having  the  property  that  if 
any  failed  component  is  analytically  deleted 
from  the  cut  the  remaining  components  no 
longer  comprise  a  cut. 


The  steps  in  the  minimum  cut  method  are: 

1 .  Find  the  minimal  cuts  for  each  phase 

2.  Combine  phase  minimal  cuts  into  mis¬ 
sion  minimal  cuts 

3.  Group  dependent  mission  minimal  cuts 

4.  Find  the  probability  that  at  least  one 
minimal  cut  in  a  group  will  fail 

5.  Combine  group  probabilities 

An  example  of  a  two  phase  system  (Fig¬ 
ure  4-14)  is  carried  along  to  illustrate  the 
method.  Subscripts  denote  phases.  It  can  be 
seen  that,  for  example,  failure  of  component 
E  in  phase  1  would  not  fail  the  system  in 
phase  1 ,  but  would  render  it  incapable  of  per¬ 
forming  phase  2. 

Finding  the  Minimal  Cuts  for  Each  Phase 

The  minimal  cuts  for  each  phase  shown  in 
figure  4-14  are  easily  obtained  by  inspection. 
The  formal  procedure  is: 

1.  Obtain  Boolean  expression  for  system 
success 

2.  Complement  the  expression 

3.  Place  in  disjunctive  form 

4.  Expressions  between  the  ORs  are  the 
minimal  cuts. 

Subscripts  in  figure  4-14  denote  the  mis¬ 
sion  phase.  Thus  A,  is  the  event  that  com¬ 
ponent  A  fails  in  phase  I .  We  will  let  cj  stand 
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for  cut  j  in  phase  i. 

The  Boolean  expressions  are: 

Phase  1 

s,  =  l A,  ub,  ]  n  [r,  ]  n  id, u  (h,  of,  >1 
Phase  2 

S2  =  [A2  uf2  |  nc2n  e2 

Their  complements  are: 

S,  =  [A,  n  B,  ]  U  (C,  |  U  [D,  n  (F,U  F, )] 

S2  =  |A,nF2|  uTjUTj 

In  the  disjunctive  form: 

S',  =  [A, nB, |  u[C,|  c  [D,nT,|  uID.n?,! 

~St  =  |A2nT2]  uc2  uT2 
Therefore  the  minimal  cuts  for  Phase  1  are: 
c!  =  |  A,  n"B,  ] 
cj  =  [Ct  ) 
c*  =  [D,  nl",  ] 
ctMD.nF,] 

and  the  minimal  cuts  for  Phase  2  are: 
c2  =  [^2n  F2  ] 

cj  =  [C2  ]  (4-7) 

c^  =  [H2] 

Probability  of  system  failure  is  equal  to  the 
probability  that  all  of  the  elements  in  at  least 
one  of  the  minimal  cuts  will  fail.  Probability 
of  system  success  in  a  phase  is  equal  to  1-P 
(all  components  in  at  least  one  cut  fail).  Thus. 

P(S,)  =  1  -PlcjUcjUc^Uc?) 

P(S2)  =  1  -  P[  c2  U  c2  U  Cj ) 

Combining  Phase  Minimal  Cuts  into  Mission 
Minimal  Cuts 

For  a  multiphase  mission,  minimal  cuts 


must  be  found  that  represent  the  system  from 
initiation  of  the  mission  through  the  phase 
being  evalutated.  These  are  called  mission 
minimum  cuts,  and  are  designated  by  capital 
C.  They  are  found  by  combining  previous 
mission  minimum  cuts  with  current  phase 
minimum  cuts  obtained  for  each  phase  with¬ 
out  regard  to  any  other  phase. 

Since  phase  1  is  the  first  phase,  mission 
cuts  are  identical  to  the  phase  cuts. 


Superscripts  are  ordinal  indices  used  to  dis¬ 
tinguish  among  cuts. 

Mission  cuts  through  phase  2  are  obtained 
by  combining  phase  2  cuts  with  previous 
mission  cuts  (i.e..  with  phase  1  cuts)  A  cut 
must  appear  once  and  only  once  on  the  list. 
Therefore,  three  rules  are  followed: 

1.  Current  phase  cuts  become  current 
mission  cuts.  (In  the  notation  used  below, 
the  symbol  -*■  is  read  as  “becomes"). 

2.  Any  previous  mission  cut  which  in¬ 
cludes  a  current  phase  cut  is  dropped.  (This 
prevents  the  cut  from  being  listed  twice). 

3.  Any  previous  mission  cut  which  does 
not  contain  a  current  phase  cut  becomes  a 
current  mission  cut.  (This  assures  that  the  cut 
will  continue  to  be  listed  exactly  once). 

In  the  previous  example, 


c1  -c 
v  1  2 

(rule  3) 

q  is  dropped  and  c2  -♦  q 

(rule  2) 

q  is  dropped  and  c*  —  C* 

(rule  2 ) 

r4  -*  c4 

1  2 

(rule  3) 

c1  -*•  C5 
^2  2 

(rule  1 ) 

In  summary: 


q  =  A,  n  B, 

CJ2  =  c; 

c>  =  f2 

q  =  d, 

q  =  A2  n  f2 
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represent  the  mission  minimal  cuts  for  the 
two  phase  mission  illustrated  in  figure  4-14. 
Additional  phases  are  processed  in  like  man¬ 
ner. 

The  evaluation  of  mission  cuts  through  a 
given  phase  i  will  give  the  unconditional 
probability  of  subsystem  success  through 
that  phase. 

Grouping  Dependent  Mission  Minimal  Cuts 

It  is  an  infrequent  circumstance  that  all 
mission  cuts  are  independent.  But  they  can 
be  collected  into  independent  or  disjoint 
groups,  each  group  containing  only  mission 
cuts  having  components  in  common.  Each 
cut  in  a  group  “chains”  to  at  least  one  other 
mission  cut  in  the  same  group  by  having  one 
or  more  components  in  common  with  it.  In 
the  example  under  discussion, 

Group  1=  (£2  ’  Cjj  Cj]  =  [(A,  OB,),  (A2  OF2), 
CD,  OF,)] 

Group  2=  [Cj]  =  (T2) 

Group  3=  (C2|  =  (r2| 

The  reliability  of  the  system  for  the  two 
phase  mission  is: 

PCS)  =  n  Il-PCG,)]  (4-8) 

i-  I 

where  N  is  three  in  this  case  since  there  are 
three  groups. 

Finding  The  Probability  That  At  Least  One 
Cut  In  A  Group  Will  Fail 

The  number  of  cuts  in  a  group  is  in  general 
N.  To  determine  the  probability  that  at  least 
one  cut  in  a  group  will  fail,  we  sum  .the  N 
first  order  terms  and  subtract  the  (2J  sec¬ 
ond  order  terms  (all  combinations'  of  cuts 
taken  two  at  a  time),  we  add  the  I  2  j  third 
order  terms,  proceeding  with  alternating  alge¬ 
braic  signs  until  we  come  to  the  one,  I Jj  J  , 
Nth  order  term.  ' 


Tot  the  first  group  we  would  have: 

P(G,)=  piq  u  c*  u  q  j 

=  +  (+P(q)  +  p(q)  +  pcq)] 

-  [P(q  nq)  +  p(q 

+  p<q  n  q )] 

+  [Pcq  nc*nq)i 


This  can  also  be  written  as: 

P(G , )  =  +  { +P( A ,  n  B , )  +  P(A2  o72  )  (4-9 ) 

+  P(D,  OF,)) 

-  j+P[(A,  0B,)0(A2  OF2)) 

+  P((A,  OB,)0(D,  OF,)) 

+  P[(A2OF2)n0D,  OF,)]} 

+  P((A,  OB,)n(A2  OF2)n(D,  OF,)] 

Note  for  group  1,  N=3.  Therefore,  there  are 
(")  or  3x2x1  /((l)  x  (2x1))  01  3  first  order 
terms, (*)  or  (3x2xl/((2xl)  x  (lxl))  or  3 
second  order  terms,  and  (")  or  1  third 
order  term.  The  algebraic  signs  alternate  plus 
for  the  first  order  terms,  minus  for  the  second 
order  terms  and  plus  for  the  third  order 
terms.  In  this  example,  both  group  2  and 
group  3  have  only  one  term,  N=l: 

P(Gj)  =  P(C»)  =  P[C2]  (4-10) 

P(G3)  =  P(C*)  =  P(E2]  (4-11) 

Combining  Group  Probabilities 

P(S)  =  P[G,  U  G2  U  Gj  U  •  •  •  GN  ] 

The  number  of  groups  is  in  general  N.  To 
determine  the  probability  that  at  least  one 
group  will  fail,  sum  the  N  first  order  terms 
and  subtract  the  second  order  terms 

(all  combinations  of  groups  taken  two  at  a 
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time),  add  the  ^3^  third  order  terms,  pro¬ 
ceeding  with  alternating  algebraic  signs  until 
the  one,  (  n  ^  .  N,h  order  term  is  reached. 

Since  there  are  three  independent  groups  in 
the  system: 

P(S)  =  +1+P(G, )  +  P(G2 )  +  P(Gj )]  (4-1 2) 

-  |+P[(G, )  n  (G2 )] 

+P[(G, )  n  (G3)l  +  P((G2 )  n  (G3)i| 
+  Pl(G,)n(G2)n(G3)l 

It  should  be  noted  that  this  expression 
has  seven  terms.  If  the  five  mission  cuts  had 
been  used  directly,  without  grouping  depen¬ 
dent  cuts,  (a  permissible  approach),  there 
would  have  been  thirty-one  terms  in  the  ex¬ 
pression  for  P(S):  (")  or  5  first  order,  10 
second  order,  10  third  order,  5  fourth  order 
and  1  fifth  order  terms. 

If  a  solution  for  the  first  phase  is  desired, 
it  is  efficient  to  group  the  cuts  in  the  first 
phase.  The  groups  for  phase  1  would  be: 

Group  1  =  [Cj]  =  [A,  n¥,J 

Group  2  *  [Cj]=[C,]  (4-13) 

Group  3  =  [Cf , C^l  =  (CD,  nE.UD,  OF,)] 

The  probability  of  system  success  is: 

P(S)  =  1-P(S) 

Approximations 

A  conservative  estimate  of  system  reli¬ 
ability  for  a  multiphase  mission  is_available 
by  truncating  the  expression  for  P(S)  at  any 
negative  sign.  The  expression  was: 

P(S)  =  +(+P(G, )  +  P(G2 )  +  P(G3  )J 
-|+P((G,)n(G2)] 

+P[(G,)n(GJ)l  +Pl(G2)n<G3)]J 
+  +PI(G1)n(G2)0(G,)l 


The  approximation  is  obtained  by  truncat¬ 
ing: 

P(S)  =  +[+P(G,)  +  P(G2)  +  P(G3)l 
PCS)  =  1  -  P(S) 

The  error  will  be  less  than  the  first  term 
eliminated  in  this  case: 

-  |  P[(G, )  n  (G2)l  +  P[(G, )  O  (G3)l 

+  P[(G2)n(G3)]| 

Numerical  Example 

In  the  model  illustrated  in  figure  4-14, 
assume  that  the  probabilities  of  success  for 
each  element  in  phase  1  are: 

P(A, )  =  0.9500 
P(B, )  =  0.9000 
P(C, )  =  0.9600 
P(D, )  =  0.9300 
P(E,)  =  0.9700 
P(F, )  =  0  .9400 

The  probabilities  of  success  of  each  ele¬ 
ment  in  phase  2,  given  that  the  element  has 
survived  phase  1 ,  are: 

P(A2[A,)  =  0.9700 
P(E2|E,)  =  0.9100 
P(F2  |F, )  =  0.9000 
P(C2 1C, )  =  0.9200 

Then  the  following  complementary  proba¬ 
bilities  are  true: 

P(A, )  =  0.0500 
P(B, )  =  0.1000 
P(A2|A,)  =  0.0300 
P(E2|E,  )  =  0.0900 
P(C, )  =  0.0400 
P(T>, )  =  0.0700 
P(T2|F,)  =  0.1000 
P(E , )  =  0.0300 
P(F, )  =  0.0600 
P(C2  |C, )  =  0.0800 

Because  unconditional  probabilities  are  used 
in  the  minimum  cuts  method,  the  uncondi- 
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tional  probabilities  of  phase  2  are  needed  and 
are  as  follows: 

P(A,)  =  P(  A  2 1 A ,  )P(A, )  +  P(A2|A,)P(A,) 

=  (0.9700)  (0.9500)  +  (0)  (0.0500) 

=  0.9215 

P(A2)  =  1  -P(A,)  =  .0785 

P< E j )  =  P(F:  iF,  1P(F, )  +  P(F2  |F,  )P(F, ) 

-  (0.9000)  (09400)  +  (0)  (0.0600) 

=  0  8460 

Pi )'  .  :  --  )  -P(F,l-  0.1540 

;1  -  P<C:  !(•(  )P(C, )  +  P(C2  |C,  »P(C ,  ) 

■-=  (09200)  (0.9600)  +  (0)  (0.0400) 

-  0.8 S3 2 

PiCj)  =  1  -  P(C2)  =  0.1  168 

P(f  2 )  =  P(Hj  IH ,  )P(E , )  +  P(E2  |E ,  )PfE , ) 

=  (0.9100)  (0.9700  +  (0)  (0.0300) 

=  0.8827 

P(L2)  =  1  -  P(E2)  =0.1 173 

Exact  reliability  of  two-phase  mission  using 
equation  (4-9)  for  Group  1 , 

P(G,»  =  P(A , )  P(B , )  +  P(A 2 )  P(F2 )  +  P(D, )  P(F, ) 

-  P(B , )  P(  A , )  P(F 2  ) 

-  P(A , )  P(B , )  P(D, )  P(F , ) 
-P(A2)P(F',)P(D1) 

+  P(B , )  P(A , )  P(F , )  P(D, ) 

P(G, )  =  0.0202 


Similarly,  from  equation  (4-10),  F  :  = 
P(G2 )  =  _0. 1 1 68 .  From  equation  1 ), 
P(Gj )  =  P(F.2 )  =  0. 1  1  73. 

Finally,  using  equation  (4-8),  which  reflects 
the  fact  that  the  groups  are  independent,  the 
two-phase  mission  reliability  Rs  is 

Rs  =  ( 1  -P(G ,  )J  [1-P(G2)1  M-P(G3)J 

=  (.9798)  (0.8832)  (0.8827)  (4-14) 

=  .7639 

Multiplying  Phase  Reliabilities 

Since  the  phases  uf  the  two  phase  mission 
are  not  independent,  it  is  not  correct  to  multi¬ 
ply  the  reliability  of  phase  1  by  the  reliability 
of  phase  2  to  obtain  mission  reliability. 

Phase  1 

Groups  for  phase  1  are  given  in  equation 
4-13. 

G,  =  \  OB, 

G2  =  C, 

G3  =  [D,  HE,  ,  D,  OF,] 

Phase  2 

Groups  for  phase  2  are  the  minimal  cuts  of 
equation  4-7. 

G,  =  A2  DFj 
G2  =  C2 

g3  =  f2 

Then,  for  phase  1 

P(G,)*P(A,)P(Bi)  =0.0050 

P(Gj)  =  P(C, )  =0.0400 

P(GJ)  =  P{(D,nE1)U(D|nT|)  =0.0062 
=  P(d,)P(Ii)  +  p(d1)P(f)) 
-PdS.jPfE^PfT5,) 
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and  for  phase  2 

P(G , )  =  I,(Aj)P(Fj)  =0.0121 
P(Gj)  =  P(C2)  =0.1168 

P(G3)  =  P(E2)  =0.1173 

and, 

R(Pliase  1)  =  (.9950)  (.9600)  (.9938) 

=  .9493 

R(Phase  2)  =  (.9879)  (.8832)  (.8827) 

=  .7702 

The  product  (.9493)  (.7702)  is  .7312  which 
underestimates  the  true  mission  reliability  of 
0.7639  (Equation  4-14),  a  significant  under¬ 
estimate. 

4.2. 2.2  Availability  Modeling 


4. 2. 2. 2. 2  Mathematical  Models  -  Availability 

I  lie  models  of  §  4 .2 .2. 1.2  are  applicable  to 
availability  with  the  appropriate  changes  in 
symbols  if  the  block  can  be  repaired  during 
the  mission  phase,  (i.e.,  A  (availability)  for 
R  (reliability)  and  U  (unavailability)  for  Q 
(Unreliability).] 

A  subsystem  consisting  of  n  equipments  in 
series  is  available  when  all  n  equipments  are 
available.  Thus,  if 

MTBFj 

Ai  "  MTBFj  +  MT~Tr,  (4'15> 

is  the  availability  of  the  ith  equipment,  and  if 
all  equipments  fail  and  are  repaired  indepen¬ 
dently,  the  availability  of  the  subsystem  is 


An  availability  model  indicates  the  manner 
in  which  the  availability  of  a  system  depends 
on  the  reliability  of  the  system’s  constituent 
elements  and  on  their  maintenance  character¬ 
istics.  (Note:  as  was  indicated  in  §  2.2.1, 
this  manual  treats  availability  from  a  cor¬ 
rective  maintenance  viewpoint.)  This  avail¬ 
ability  model  consists  of  a  block  diagram  and 
one  or  more  mathematical  equations. 

The  functional  configuration  of  the  system 
in  each  of  its  operating  modes  is  defined  by 
means  of  block  diagrams,  normally  one  dia¬ 
gram  for  each  mode.  In  the  diagram  each 
block  represents  an  equipment  or  group  of 
equipments.  The  directions  of  functional 
flows  are  labeled  and  inputs  and  outputs  are 
identified.  Thus  a  functional  block  diagram 
is  a  graphical  representation  of  the  depend¬ 
ence  of  system  performance  on  the  operabil¬ 
ity  of  its  hardware  elements.  In  addition,  an 
equipment  “tree”  diagram,  based  on  packag¬ 
ing  rather  than  functional  relationships  should 
be  supplied,  detailing  hardware  down  to  and 
including  the  component  level.  Figure  A-l  is 
a  typical  system  block  diagram.  The  system 
is  completely  analyzed  in  Appendix  A. 

4. 2.2.2. 1  Availability  Block  Diagrams 

The  block  diagrams  used  in  availability 
analysis  are  essentially  the  same  as  those  de¬ 
scribed  in  §  4 .2.2. 1.1. 


A  =  n  A,  (4-16) 

i=  I  1 

The  availability  of  a  subsystem,  of  n  identi¬ 
cal  equipments  in  parallel,  any  m  of  which  are 
required  to  be  up  for  the  system  to  be  up,  is 
given  by 


A  =  nZm  (")  Af-d-A,)*  (4-17) 

»  -  o  V  x  / 

where 

(")  =  |n!|/|(n-x)  !  (x)  !j 
For  a  system  such  as 
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where  repair  is  not  permitted,  the  MTBF  (Fig¬ 
ure  F-l  1 )  is 

MTBF  =  1.083333 /X 

The  subsystem  degrades  as  failures  occur  to 
a  2  of  3  and  then  a  2  of  2  subsystem  before 
the  subsystem  goes  down. 

If  repair  is  permitted,  figure  E-l  1  shows 
the  MTBF  to  be 

MTBF  =  0.083333  m2  A3 

When  a  failure  occurs  repair  is  initiated 
immediately  The  subsystem  goes  to  2  of  3 
when  the  first  failure  occurs.  It  only  goes  to 
2  of  2  if  a  second  failure  takes  place  before 
the  first  repair.  For  a  repair  rate  of  1  repair/ 
hour  and  a  failure  rate  of  .001  failures/hour, 
the  MTBF  of  the  subsystem  without  repair 
is  1,083.333  hours  and  the  MTBF  will  repair 
is  over  83  million  hours. 

4.2.3  Reliability  and  Availability 
Apportionment 

Apportionment  is  the  process  by  which 
requirements  are  allocated  from  the  system 
level  to  lower  assembly  levels.  System  re¬ 
quirements  will  have  been  established  by  the 
procuring  activity  prior  to  full  scale  develop¬ 
ment. 

System  requirements  stem  from  or  relate 
to  mission  needs.  Usually  more  than  one 
approach  is  available  to  fulfill  a  need  or  to 
satisfy  a  mission  requirement.  Therefore, 


trade-offs  may  be  made  between  competing 
system  designs,  such  as  redundancy  or  derat¬ 
ing  approaches. 

It  should  be  noted  that  requirements  are 
being  apportioned  Requirements  should  not 
be  set  arbitrarily  since  changes  in  require¬ 
ments  normally  incur  a  cost  penalty. 

4.2.3. 1  Reliability  Apportionment 

Reliability  requirements  must  be  appor¬ 
tioned  prior  to  design.  Because  apportion¬ 
ment  is  properly  completed  before  the  de¬ 
sign,  apportionment  techniques  are  of  neces¬ 
sity  somewhat  subjective.  As  an  example, 
consider  a  hypothetical  re-entry  system  with 
design  requirements  as  shown  in  figure  4-15. 
When  the  apportionment  establishes  the  de¬ 
sign  requirement,  it  influences  design  deci¬ 
sions  such  as  redundancy,  part  quality  and 
electrical,  mechanical  and  thermal  derating 
philosophy. 

Essential  functions  of  the  system  include 
separation,  environmental  control,  decoy, 
maneuvering,  arming  and  fuzing,  and  attitude 
control.  Without  necessarily  defining  the 
hardware  concept  (e.g.,  the  attitude  control 
system  could  use  hot  gas,  cold  gas,  rocket, 
spring,  or  other  means  to  accomplish  its 
function)  to  be  used,  relative  weights  are 
assigned  to  each  function  in  the  categories  of 
criticality,  complexity,  design  maturity,  and 
severity  of  mission  profile.  Other  categories 
(e.g.,  MEC  codes  could  be  used  as  a  category) 
may  be  appropriate  for  certain  systems.  Rela¬ 
tive  weights  (scores)  are  assigned  on  a  1  to  10 
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Reliability  Specification 

System 
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Objective 

Producer’s 

Risk 
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.9512 

20% 

.8607 

20% 

Figure  4-15.  System  Reliability  Requirements 
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scale.  In  all  categories  except  criticality,  high 
scores  represent  more  design  difficulty  i.e., 
10  is  more  complex  on  complexity  scale;  10 
is  the  least  advanced  on  the  design  mature 
scale;  10  has  most  difficult  mission  to  survive 
on  mission  profile  scale.  Critica'ity  is  a  mea¬ 
sure  of  the  importance  of  a  function  to  a  suc¬ 
cessful  mission.  If  a  loss  of  the  function 
aborts  the  mission  a  score  of  1  is  appropriate; 
10  implies  little  effect  on  the  mission  if  the 
function  is  lost.  While  this  scoring  is  reversed 
from  the  other  categories,  it  is  correct  because 
it  requires  high  reliability  for  elements  that 
are  most  essential  to  the  mission. 

There  are  many  methods  for  assigning 
scores.  The  system  engineer  can  assign  them . 
A  group  of  cognizant  people  can  assign  them 
independently,  then  resolve  significant  differ¬ 
ences  by  discuss'on  or  by  taking  an  average 
score.  Various  paired  comparison  schemes  can 
also  be  used. 

Complexity 

A  score  of  ten  could  be  assigned  to  the 
most  complex  device  (e  g.,  estimate  the  num¬ 
ber  of  active  and  passive  parts  to  develop  a 
score).  Relative  scores  would  rank  other  de¬ 
vices  1-10. 

Maturity  Score 

A  score  of  ten  could  be  assigned  to  new 
devices  (i.e.,  advances  in  state-of-the-art  or 
questionable  characteristics).  Mature  devices 
should  be  rated  1-9  depending  on  the  degree 
of  maturity. 

Mission  Profile 

A  score  of  ten  could  be  assigned  to  the  de¬ 
vices  that  must  survive  the  most  severe  mis¬ 
sion  profile  (e  g.,  the  re-entry  system  must 
survive  launch  and  re-entry  environments  and 
would  be  assigned  a  10;  other  missile  body 
segments  complete  their  mission  before  re¬ 
entry  and  would  receive  correspondingly 
lower  scores).  On  board  the  submarine  where 
the  environment  is  essentially  the  same  for 
many  items,  the  duty  cycle  is  used  to  estab¬ 
lish  the  relative  scores. 


Criticality 

A  score  of  one  is  assigned  to  all  items 
whose  failure  would  result  in  the  loss  of  the 
mission.  Higher  scores  are  assigned  to  items 
that  are  backed  up  or  may  not  impact  the 
main  mission  (e  g.,  a  decoy  system  failure 
would  not  prevent  the  re-entry  system  from 
reaching  its  target,  particularly  an  undefended 
target). 

Figure  4-16  illustrates  the  mechanics  of  the 
apportionment  of  a  system  reliability  re¬ 
quirement  of  0.9512  (failure  rate  0.0500  fail¬ 
ures/mission)  after  scores  have  been  assigned 
on  a  1-10  scale  in  each  of  the  four  categories. 
The  scores  are  summed  for  each  function  and 
presented  in  column  six,  titled  function  score. 
The  result  of  summing  column  six  is  103,  the 
total  re-entry  vehicle  score.  The  failure  rate 
allocated  to  each  function  is  obtained  by 
multiplying  the  system  failure  rate  (0.0500) 
by  the  ratio  of  the  function  score  for  the 
appropriate  function  over  the  total  system 
score  of  103,  as  shown  in  column  seven.  The 
last  columns  show  the  apportioned  failure 
rates  and  reliability  requirements. 

It  should  be  noted  that  when  a  function  or 
hardware  component  whose  reliability  is 
known  is  to  be  used  in  a  new  design,  the 
proper  approach  is  to  recognize  this  fact  initi¬ 
ally  and  factor  this  function  or  component 
out  of  the  apportionment.  Since  the  reli¬ 
ability  of  the  function  or  component  is 
known,  and  no  resources  are  to  be  applied 
to  modify  it,  it  would  be  inconsistent  with 
the  objectives  of  the  apportionment  process 
to  require  other  than  its  known  reliability. 
Of  course,  later  tradeoff  studies  may  suggest 
the  desirability  of  expending  effort  to  im¬ 
prove  the  reliability  of  a  component  of  estab¬ 
lished  design. 

Assume  the  Command  Link  Function  of 
figure  4-16  is  known  to  have  a  reliability  of 
0.9910  based  upon  extensive  testing  in  an 
earlier  program,  and  that  this  function  is  to 
be  used  without  change.  Apportionment  can 
proceed  as  follows 

System  failure  rate  requirement  0.0500 

Known  failure  rates  (In  this  case 

the  failure  rate  of  the  command 

link  function.)  -0.0090 

Apportion  to  remaining  functions  0.0410 
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Figure  4-17  shows  that  using  the  command 
link  function,  which  is  less  reliable  than  its 
initial  apportionment,  increases  the  design 
requirements  for  the  other  functions.  Figure 
4-18  illustrates  apportionment  of  the  moni¬ 
toring  function  failure  rate  to  the  component 
level. 

As  the  design  progresses,  feasibility  studies 
are  normally  conducted  to  determine  if  candi¬ 
date  designs  will  meet  their  requirements. 
Reliability  prediction  is  a  feasibility  study 
which  permits  comparison  of  the  predicted 
reliabilities  with  required  reliabilities.  When  a 
prediction  is  lower  than  the  requirement,  it  is 
desirable  to  assign  resources  in  the  manner 
which  achieves  the  necessary  reliability  im¬ 
provement  at  minimum  cost.  While  general 
strategies,  such  as  increased  derating  or  use  of 
higher  grade  parts,  produce  an  overall  gain  in 
reliability,  concentration  on  specific  problem 
areas  by  adding  redundancy  or  redesigning  a 
component  or  equipment  may  be  more  cost- 
effective 

For  a  few  items  substantive  improvement 
may  not  be  feasible.  This  might  be  true  for 


example,  for  an  off-the-shelf  component  of 
established  design,  which  had  already  benefit¬ 
ed  from  one  or  more  product  improvement 
programs.  In  general,  an  analyst  can  not  form¬ 
ulate  component  reliability  as  a  deterministic 
function  of  resources  expended  for  reliability 
improvement,  but  it  is  often  possible  to  esti¬ 
mate  these  functional  relationships  subjective¬ 
ly  using  past  experience  as  a  guide.  Reliability 
improvements  realizable  by  various  actions 
can  be  listed  along  with  expected,  optimistic, 
and  pessimistic  predictions  of  the  cost  of  each 
action.  With  the  above  information  available, 
simple  cost-benefit  analysis  techniques  can  be 
applied  to  allocate  resources  effectively. 

When  it  is  determined  that  redundancy  is 
required  to  meet  a  reliability  requirement,  the 
analyst  can  make  use  of  techniques  such  as 
Kettelle’s  algorithm  (cost  &  reliability  trade¬ 
off)  (6,  7)  for  optimal  allocation  of  redun¬ 
dancy. 

Figures  4-16  through  4-18  allocate  the 
allowable  failure  rate;  it  is  also  possible  using 
similar  methods  to  allocate  the  failure  proba¬ 
bility,  Q,  or  the  reliability,  R,  directly . 
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Figure  4-16.  Functional  Reliability  Apportionment 
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Figure  4-17.  Functional  Reliability  Apportionment  with  Command  Link  Reliability  Known 
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Figure  4-18.  Component  Reliability  Apportionment  of  Monitoring  Function 


4. 2.3. 2  Availability  Apportionment 

System  availability  is  a  joint  function  of 
the  availability  of  the  system's  equipments. 
Each  of  these,  in  turn,  is  a  joint  function  of 
the  reliability  and  maintainability  of  the 
equipment,  which  can  combine  in  various  pro¬ 
portions  to  yield  a  given  level  of  availability 
In  its  initial  configuration  a  system  will 
possess  certain  inherent  levels  of  reliability 
and  maintainability,  which  together  establish 
an  upper  limit  on  the  availability  attainable 
by  the  system.  Beyond  this  limit,  additional 


effort  can  be  applied  to  develop  either  in¬ 
creased  reliability  or  improved  maintainability 
or  both,  in  order  to  increase  availability.  Fig¬ 
ure  2-1  illustrates  the  dependence  of  avail¬ 
ability  on  reliability  and  maintainability. 
Specifications  of  an  availability  requirement 
defines  the  ordinate  on  the  curve  of  figure 
2-1  Move  horizontally  to  an  MTBF  curve  and 
vertically  downward  to  the  corresponding 
MTTR.  A  trade-off  region  exists  along  the 
ordinate  since  an  infinite  number  of  MTBF- 
MTTR  pairs  can  satisfy  the  requirement  A 
variety  of  criteria  can  form  a  basis  for  such  a 
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trade-off.  One  method  considers  the  marginal 
costs  of  improving  reliability  and  maintain¬ 
ability.  Appendix  A  includes  an  example 
of  such  a  trade-off. 

When  a  trade-off  has  been  made,  as  the 
final  step  of  system  analysis,  availability 
requirements  may  be  apportioned  to  equip¬ 
ments  in  a  consistent  and  logical  manner.  The 
objective  of  apportionment  is  to  provide  goals 
against  which  the  availability  growth  of  the 
system’s  elements  can  be  measured,  and  to 
provide  designers  with  goals  for  reliability  and 
maintainability.  A  procedure  for  reliability 
apportionment,  considering  the  factors  of 
complexity  ,  state-of-the-art,  duty  cycle,  and 
criticality  was  described  in  the  previous  para¬ 
graph.  Apportionment  of  maintainability  can 
be  done  in  similar  manner,  based  on  one  or 
more  of  the  same  factors  or  based  on  consid¬ 
erations  of  location,  packaging  or  physical 
configuration  of  the  system.  But  unlike  reli¬ 
ability.  maintainability  requirements  cannot 
logically  be  apportioned  to  assembly  levels 
below  the  lowest  levels  specified  as  repairable 
on-site  under  the  users’  maintenance  policies. 

The  MTTR  of  a  system  is  the  average  of  its 
subsystem  or  equipment  MTTRs,each  weight¬ 
ed  by  the  failure  rate  X,  of  the  subsystem  or 
equipment  and  written: 

Z  (A,  MTTR,) 

MTTRS  =  —ix—  • 


If  the  system  is  series  related  with  exponential 
failure  and  repair  rates,  its  failure  rate  is  As  = 
ZA.  as  previously  defined  Thus  the  system  re¬ 
pair  rate  /i$  =  1/MTTRS  can  be  written  as: 

^  ‘  KA,X) 

A  commonly  used  criterion  for  maintain¬ 
ability  apportionment  is  the  condition 


which  results  in  subsystem  MTTR  apportion¬ 
ments  inversely  proportioned  to  subsystem 
failure  rates;  that  is  lower  MTTR  require¬ 
ments  are  assigned  to  subsystems  having  high¬ 
er  failure  rates.  Applying  that  condition  to  a 
system  of  n  subsystems  gives: 

Mj  *  nA;  (jus/As)  • 

For  example,  assume  a  shipboard  catapult 
consisting  of  launching,  braking  and  retrac¬ 
tion  subsystems  has  a  system  requirement  of 
1850  mean  launch  cycles  between  failure. 
The  system  availability  requirement  is  A  =  .98 
and  the  reliability  apportionment  process  has 
resulted  in  the  failure  rate  apportionments 
shown  in  figure  4-19.  Average  launch  rate  is 
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Figure  4-19.  Catapult  Model  and  Apportioned  Subsystem  Failure  Rates 
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10.67  launches  per  hour,  so  the  apportioned 
system  MTBF  is  1850/10.67  =  173.38  hours. 
Thus,  the  .98  availability  requirement  dic¬ 
tates  MTTR  not  greater  than 

MTTRS  =  MTBFS  =  173.38  ( 

=  3.53  hours. 

Subsystem  MTTR  requirements  are  to  be  ap¬ 
portioned  so  that  MTTRS  =  3.53  hours. 
Therefore  the  expression 


is  applied  to  each  subsystem  failure  rate  yield¬ 
ing  the  MTTR  requirements  listed  in  figure 
4-20. 
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Figure  4-20.  Apportioned  Subsystem  MTTRs 


Maintainability  apportionment  can  be  sim¬ 
plified  in  cases  where  it  can  be  shown  that 
peripheral  conditions  unique  to  the  operation¬ 
al  environment,  such  as  access  limitations  or 
the  availability  of  diagnostic  equipment,  are 
the  principal  factors  acting  to  determine 
maintenance  time.  In  many  such  instances, 
the  variable  element  of  maintenance  time  is 
tightly  distributed  about  a  central  value  fixed 
by  factors  such  as  those  noted  above  which 
are  outside  the  designer’s  control.  Where  the 
dispersion  is  a  negligible  fraction  of  the  cen¬ 
tral  value,  maintenance  time  may  be  treated 
as  a  constant  for  analytical  purposes.  If  the 
quantity  of  preventive  maintenance  and  the 
downtime  necessitated  by  the  need  for  pre¬ 
ventive  maintenance  are  significant  functions 
of  system  design,  these  factors  contribute 
additional  degrees  of  freedom  to  the  appor¬ 
tionment  task. 


By  the  time  trade-off  studies  are  complet¬ 
ed,  specifications  containing  tentative  reli¬ 
ability  and  maintainability  requirements,  de¬ 
termined  subjectively,  may  already  have  been 
written  for  many  equipments.  Specifications 
prepared  in  that  manner  are  not  optimal  be¬ 
cause,  by  implementing  the  results  of  the 
trade-offs,  the  same  availabilitv  may  be 
achieved  with  reduced  expenditure  of  effort 
or  more  availability  may  be  gained  for  the 
same  effort.  Thus,  when  a  contractor  appor¬ 
tions  reliability  and  maintainability  goals,  in 
effect  he  apportions  resources  and  effort  as 
well.  One  of  the  purposes  of  apportionment  is 
to  permit  equipment  requirements  to  be  de¬ 
fined  objectively,  so  that  the  system  require¬ 
ments  can  be  realized  in  a  timely  and  eco¬ 
nomical  manner  If  the  prediction  indicates 
that  the  system  will  not  meet  its  apportioned 
availability  or  reliability  requirements,  then 
additional  design  effort  is  required.  Even 
when  the  prediction  indicates  that  a  system 
can  be  expected  to  meet  or  exceed  its  require¬ 
ments,  trade-off  studies  may  be  useful  to 
optimize  the  design  [  8 ] . 

4.2.4  Reliability  and  Availability  Prediction 

4.2.4. 1  Reliability  Prediction 

Prediction  is  accomplished  by  solving  the 
reliability  model  using  appropriate  failure 
rates  at  part  or  component  level.  Failure  rates 
for  use  in  prediction  can  come  from  sources 
such  as  M1L-HDBK-217[2) ;  NPRD-1.  Non¬ 
electronic  Parts  Reliability  Data  (9) .  the 
Government-Industry  Data  Exchange  Program 
(GIDEP),  or  may  be  derived  by  the  contractor 
by  observation  of  his  own  products  in  tests  or 
in  service,  if  a  sufficiently  large  body  of  such 
data  can  be  obtained  for  study.  Failure  rates 
must  be  corrected  for  applied  and  induced 
stress  levels  and  duty  cycles  as  determined 
by  the  mission  analysis. 

4. 2.4. 1.1  Purpose  of  Prediction 

Reliability  prediction  shall  be  used  in 
formulating  design  decisions.  The  reliability 
prediction  shornd  begin  in  the  design  phase 
and  continue  during  the  design  effort  Early 
predictions  may  be  based  primarily  on  part 
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counts  or  known  reliability  of  similar  com¬ 
ponents.  As  design  information  becomes 
available,  predictions  can  be  updated  using 
stress  data  on  specific  parts  and  reflecting 
the  actual  components  utilized  in  the  design. 
Reliability  prediction  has  several  purposes, 

a.  as  a  basis  of  selection  among  competing 
designs  (predictions  should  use  same  data 
sources  and  assumptions), 

b.  to  disclose  critical  or  reliability  limit¬ 
ing  items  in  the  design. 

c.  sensitivity  of  design  to  electrical  stress, 
thermal  stress  and  part  quality, 

d  as  a  basis  for  reliability  trade-offs  among 
system  components, 

e,  to  describe  numerically  the  inherent 
reliability  of  the  design. 

f.  to  provide  inputs  to  Design  Review; 
Failure  Mode.  Effects,  and  Criticality'  Analysis 
(FMECA);  Maintainability  Analysis;  Safety 
Analysis;  Logistic  Support;  and  Thermal 
Design . 

4.2  4.1.2  Policy 

SSPO  policy  as  reflected  in  NAVSEA  OD 
21549jlJ  is  to  require  initial,  intermediate 
and  final  reliability  predictions.  An  initial 
prediction  forecasts  the  reliability  of  the  pro¬ 
jected  final  product.  This  forecast  is  based 
on  the  characteristics  of  the  early  design  and 
improvements  expected  during  the  develop¬ 
ment  phase.  An  intermediate  prediction  up¬ 
dates  the  initial  forecast.  The  update  is  based 
on  increased  design  information,  including 
environmental  data  and  internal  stress  in¬ 
formation.  A  final  prediction  is  based  on  the 
design  submitted  for  final  design  review.  It 
predicts  the  operational  reliability  of  the  item 
based  on  all  relevant  information  available  at 
that  point  in  the  program. 

Predictions  should  be  performed  using  the 
most  realistic  failure  rates  available.  Data 
from  almost  identical  hardware  used  in  al¬ 
most  identical  applications  should  provide  a 
more  realistic  data  base  for  predictions  than 
average  failure  rates  from  M1L-HDBK-217(2] 
and  RADC  publications.  The  depth  of  the 
prediction  analysis  should  be  consistent  with 
the  level  of  design  definition  available. 


4.2.4. 1.3  Prediction  Methods 


Reliability  prediction  requires  knowledge 
of  the  quality  of  parts  that  will  be  employed 
(commercial,  JAN,  JANTX,  JANTXV).  the 
level  to  which  the  parts  will  be  screened,  the 
temperature  at  which  the  parts  will  be  used, 
the  degree  to  which  the  parts  will  be  elec¬ 
trically  and  environmentally  derated,  and  any 
redundancies  employed.  Figures  4-21  and 
4-22  illustrate  the  prediction  process.  MIL- 
HDBK-217[2)  describes  the  most  widely  used 
prediction  methods  for  electrical,  electro¬ 
mechanical  and  electronic  parts. 

The  reliability  prediction  of  nonelectronic 
parts,  such  as  gaskets,  seals,  valves,  clutches, 
etc.,  is  accomplished  using  various  sources  of 
failure  rate  data.  These  sources  include  (a) 
NPRD-1,  “Nonelectronic  Parts  Reliability 
Data”  [9) .  published  by  the  Reliability  Anal¬ 
ysis  Center  at  Rome  Air  Development  Center, 
(b)  G1DEP  (Government  and  Industry  Data 
Exchange  Program,  (c)  Vendor  data  and  (d) 
In-house  data. 

The  NPRD-1  document  [9]  provides  fail¬ 
ure  rates,  including  a  mean  and  upper  and 
lower  limits  on  a  60 9?  confidence  interval,  for 
a  limited  number  of  devices.  G1DEP  provides 
failure  rates,  including  a  mean  and  upper  and 
lower  limits  on  a  90 7c  confidence  interval,  for 
a  wide  variety  of  devices  reflecting  various 
environments  such  as  ground,  ground  mobile, 
jet  aircraft,  missiles,  etc.  Vendor  and  in-house 
data  serves  as  a  failure  data  source  for  peculiar 
equipment  supplied  by  the  manufacturer  and/ 
or  equipment  designer.  It  may  reflect  qualifi¬ 
cation  or  environmental  test  results  or  actual 
field  use.  In  general,  the  failure  data  from 
these  various  sources  is  to  be  considered  as 
generic  and  representative  of  the  device  of 
interest.  Care  must  be  exercised  in  selecting 
a  failure  rate  for  a  particular  device  from 
these  sources  to  assure  optimum  correspon¬ 
dence  between  the  device  of  interest  and  the 
data  source  relative  to  design  similarity  and 
use  environment. 

Early  prediction  assumptions  generally 
include: 
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Figure  4-21.  Flow  of  Reliability  Prediction  Activities  and  Information 


Figure  4-22.  Reliability  Prediction  Activity 
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•  Part  Quality  -  Early  predictions  are 
usually  based  on  assumptions  as  to  the  quality 
of  parts  and  screening.  The  predictions  are 
refined  when  the  quality  of  parts  is  known. 
MIL-HDBK-217[2]  uses  a  factor,  nQ ,  to 
account  for  part  quality. 

•  Use  Environment  -  In  early  predictions 
it  may  be  necessary  to  make  assumptions  re¬ 
garding  the  environment  in  which  parts  will 
be  used.  Later,  analyses  will  provide  better 
knowleuge  of  the  use  environment.  M1L- 
H 0 BK-2  1  7 [  2 1  uses  a  factor,  I1K,  to  account 
for  the  use  environment. 

•  Application  Review  -  Early  predictions 
are  usually  based  on  assumed  derating  rules 
(e  g.,  all  parts  will  be  used  at  50%  of  rated 
load).  During  the  program,  a  part  usage  and 
application  review  should  establish  more 
accurate  application  and  derating  factors  for 
each  •'art.  Knowledge  of  the  actual  derating, 
du’  ycle.  temperature  limits  and  similar 
application  factors  permits  much  more  ac¬ 
curate  estimates  of  failure  rates.  The  analyst 
should  look  for  overstressed  parts  and  call 
them  to  the  attention  of  cognizant  line  and 
management  personnel.  It  should  be  noted 
that  SSPO  requires  that  worst  case  conditions 
[e  g  ,  environmental,  duty  cycle  and  derating] 
be  used  in  prediction  work. 

4.2.4  1.4  Prediction  Report 

A  prediction  report  should  contain  the  best 
estimate  of  the  reliability  of  the  final  design, 
based  on  information  available  when  the  pre¬ 
diction  is  made.  The  report  must  clearly  iden¬ 
tify  the  sources  of  all  data  used.  See  section 
10  for  examples  of  prediction  work  sheets. 

When  the  predicted  reliability  is  below  the 
required  reliability,  the  report  should  provide 
recommendations  for  improvement.  The  cog¬ 
nizant  design  engineering  group  should  in¬ 
dicate  actions  planned  or  taken  to  improve 
reliability.  Reliability  demonstration  testing 
should  not  be  started  while  predicted  reliabil¬ 
ity  is  below  the  requirement.  Instead,  redesign 
should  be  undertaken  and  verified  by  per¬ 
forming  evaluation  tests  of  the  redesign  ef¬ 
fort. 


4. 2.4.2  Availability  Prediction 

Availability  prediction  is  accomplished  by 
predicting  MTBF  and  MTTR.  Development  of 
reliability  predictions  has  been  discussed 
above.  MTBF  is  the  arithmetic  mean  or  statis¬ 
tical  expectation  of  time  between  successive 
failures.  Prediction  of  maintainability  indices 
is  discussed  below. 

4.2.4.2.1  Analysis  of  Corrective  Maintenance 
Tasks  and  Prediction  of  Availability 
with  Respect  to  Failure 

After  the  availability  model  has  been  writ¬ 
ten,  a  listing  based  on  the  maintenance  con¬ 
cept,  is  made  of  corrective  maintenance  tasks 
that  can  arise  because  of  failures  of  each  of 
the  equipment  blocks,  together  with  estimates 
of  their  failure  rates  and  repair  times  (from 
maintainability  predictions-see  section  10  for 
worksheet  examples).  Repair  time  should  in¬ 
clude  fault  detection  and  isolation  capabil¬ 
ities.  A  form  such  as  figure  4-23  can  be  used 
to  expedite  the  analysis.  For  series  equip¬ 
ment,  the  sum  of  the  failure  rates  of  the  com¬ 
ponents  is  a  prediction  of  the  equipment  fail¬ 
ure  rate.  The  sum  of  the  AMC  column  divided 
by  the  equipment  failure  rate  is  a  prediction 
of  the  expected  repair  time.  The  final  column 
is  an  approximation  of  the  inherent  availabil¬ 
ity  or  fractional  up-time  of  the  equipment 
block  with  respect  to  failure. 

The  pred;ction  may  be  based  on  any  of  the 
■procedures  of  MIL-HDBK-217[2] ,  MIL-STD- 
756110]  and/or  MIL-HDBK-472]  1 1  ] .  So  irce 
data  may  be  based  on  historical  experience, 
subjective  evaluation,  expert  judgment  or 
direct  measurement  of  reliability  and  main¬ 
tainability  characteristics  of  elements  of  the 
system.  However,  the  contractor  may  elect  to 
use  a  non-standard  method  specifically  ap¬ 
plicable  to  the  type  of  hardware  comprising 
the  system,  subject  to  approval  by  SSPO. 

Rules  for  developing  system  parameters 
from  those  of  lower  assemblies  depend  on  the 
usual  assumptions  of  statistical  independence 
and  exponential  behavior.  Experience  has 
shown  that  these  assumptions  are  valid  for 
many  systems. 
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Figure  4-23.  Analysis  of  Corrective  Maintenance  Tasks 
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MTTR  (Rc)  is  found  as  the  average  of 
mean  repair  times  of  all  components,  repair¬ 
able  on  line,  weighted  for  the  relative  failure 
rate  of  each  component. 

i.  (XjMc.) 

—  i=l  > 

Mc  = -  (4-18) 
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4.2.5  Failure  Mode.  Effects,  and  Criticality 
Analysis 

Failure  Mode.  Effects,  and  Criticality 
Analysis  (FMECA)  is  a  systematic,  organized 
design  evaluation  procedure  which: 

a.  Identifies  potential  failure  modes,  their 
causes  and  method  of  detection  (visual, 
manual,  automatic  (PM/FL)]  at  the  level  of 
hardware  of  interest  (e.g.,  system,  subsystem, 
equipment  or  component).  This  includes  im¬ 
pact  of  dummy  loads,  fans,  and  air  condi¬ 
tioners  which  can  fail  and  pull  the  system 
down. 

b.  Determines,  by  analysis  and  evaluation, 
the  effect  of  each  failure  mode  on  the  hard¬ 
ware  element  in  which  it  occurs,  on  the  next 
higher  assembly  and  ultimately  on  system 
operation,  mission  objectives  and  crew  safety. 

c.  Establishes  criticality  level  of  each  fail¬ 
ure  mode,  permitting  ranking  of  failure  modes 
relative  to  effect  on  the  mission. 

d.  Predicts  probability  of  occurrence  of 
each  failure  mode,  permitting  ranking  of  fail¬ 
ure  modes  relative  to  likelihood. 

e.  Provides  a  suitable  basis  for  assigning 
priorities  to  failure  mode  corrective  actions 
through  the  joint  consideration  of  criticality 
and  probability  of  occurrence. 

f.  Documents  results  in  an  orderly  fashion 
to  highlight  deficiencies  (reliability,  detection 
capability)  and  safety  problems,  recommend 
corrective  action,  identify  changes  needed  in 
test  programs,  and  aid  in  the  development  of 
operating  and  maintenance  manuals. 

g.  Provides  timely  inputs  to  design  reviews. 

h.  Provides  feedback  of  information  to 
cognizant  contractor  organizations  (e.g.,  test, 
design,  reliability,  maintainability,  systems). 


4.2.5. 1  Purpose  of  the  FMECA 

The  purpose  of  FMECA  is  to  evaluate  the 
design  by  analysis  in  the  early  design  stages. 
Specifically,  the  FMECA  is  to  identify  poten¬ 
tial  failure  modes  and  to  define  their  crit¬ 
icality  so  that  informed  decisions  can  be  made 
about  the  worthiness  of  the  design  and  the 
necessity  for  corrective  actions. 

FMECA  is  performed  in  order  to  prevent 
problems,  to  eliminate  failure  modes  during 
early  design  stages  and  before  they  actually 
occur  in  operational  use. 

The  results  of  the  FMECA  are  valuable  in 
test  program  planning  and  in  determining  the 
need  for  automatic  monitoring,  fault  detec¬ 
tion  or  alarm  design  features. 

A  completed  FMECA  consists  of: 

a.  An  orderly  list  of  failure  modes  and 
their  causes. 

b.  A  classification  or  ranking  of  the  failure 
modes  with  regard  to  their  impact  on  per¬ 
formance  and  safety. 

c.  The  probability  of  occurrence  of  the 
failure  mode,  a  ranking  with  regard  to  ex¬ 
pected  frequency  of  occurrence. 

d.  An  identification  of  existing  design  fea¬ 
tures  (e.g.,  isolation  or  fault  tolerant  tech¬ 
niques),  screening  procedures  (e.g.,  improved 
part  quality),  etc.  that  will  minimize  or  ob¬ 
viate  the  effects  of  potential  failure  modes  or 
reduce  their  probability. 

e.  Recommendations  for  precluding  or  cir¬ 
cumventing  significant  failure  modes  or  for 
reducing  their  probability  of  occurrence. 

f.  A  description  of  alarms  or  other  means 
of  detecting  the  failure  mode  and  the  fre¬ 
quency  with  which  the  mode  can  be  detected; 
(e.g.,  instantaneously,  during  daily  checkout, 
etc.). 

g.  Criteria  for  test  planning  and  the  design 
of  test  and  checkout  systems  which  are  re¬ 
sponsive  to  identified  failure  modes  and 
safety  hazards. 

h.  Criteria  for  logistics  planning  and  main¬ 
tainability  analysis  by  inclusion  of  informa¬ 
tion  for  selection  of  preventive  maintenance 
points  and  development  of  trouble  shooting 
guides. 
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i.  Identification  of  single  failure  points  in 
circuits  for  worst  case  analysis;  failure  modes 
involving  parameter  drifts  may  require  worst 
case  analysis  to  determine  criticality  (see 
NAVSEA  OD  21549(1)  for  most  severe  de¬ 
sign  analysis  requirements). 

j.  Input  data  for  trade-off  studies  and  for 
establishing  corrective  action  priorities. 

k.  Historical  documentation  for  future 
reference  to  aid  in  analysis  of  test  and/or  field 
failures  for  consideration  of  design  changes, 
and  as  an  aid  in  future  development  efforts. 

4.2.5. 2  FMECA  Method 

FMECA  is  an  interdisciplinary  study  re¬ 
quiring  skills  in  system  and  equipment  (hard¬ 
ware  and  software)  design,  reliability  analysis 
and  data  utilization,  maintainability,  safety, 
probability  concepts,  testing,  modeling,  and 
associated  mathematics.  While  an  individual 
may  be  assigned  responsibility  for  a  FMECA, 
he  will  require  team  support  in  order  to  pro¬ 
duce  results  of  significant  substance,  because 
so  many  skills  are  involved  in  the  analysis. 

FMECA  is  a  detailed  analysis  of  an  equip¬ 
ment,  subsystem  or  system.  It  is  necessary  to 
understand  how  the  device  operates  and  how 
it  interfaces  with  other  devices  to  perform  a 
mission.  The  analyst  must  explore  the  effects 
of  various  part  faults  or  functional  failures  on 
the  equipment  and  ultimately  the  system  1 1 2, 
13, 14, 15]. 

a.  Gathering  Information 

A  variety  of  information  is  required  to  pro¬ 
duce  a  meaningful  FMECA.  Figure  4-24  iden¬ 
tifies  the  types  of  information  which  are 
usually  accumulated  prior  to  performing  a 
FMECA.  Figure  4-25  shows  how  individual 
pieces  of  information  and  tasks  are  organized 
to  produce  a  FMECA.  Figure  4-26  illustrates  a 
functional  block  diagram  useful  for  perform¬ 
ing  a  FMECA. 
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Block  Diagrams 
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Maintenance 

Concepts 

System  Requirements 

Analysis 

Figure  4-24.  General  Steps,  Information,  Sources 
and  Interfaces  for  FMECA 


Analysts  performing  a  FMECA  must  first 
acquire  full  understanding  of  the  design  and 
how  it  works,  then  focus  on  how  the  design 
can  fail. 

FMECA  should  consider  the  lowest  hard¬ 
ware  level  for  which  adequate  design  defini¬ 
tion  exists.  Every  credible  potential  failure 
mode  should  be  identified  and  classified  re¬ 
lative  to  probability  of  occurrence  and  effect 
on  the  system,  mission,  or  crew  safety. 

Compensating  features  and  existing  detec¬ 
tion  capability  are  analyzed  and  additional 
compensating  features  or  detection  capability 
are  recommended  for  every  failure  mode  for 
which  there  is  a  significant  probability  of 
aborting  the  mission  or  creating  an  unaccept¬ 
able  safety  hazard. 


\ 


Figure  4-25.  FMECA  Process 
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FMECA  is  a  single-point  failure  analysis 
technique;  i.e.,  each  failure  mode  is  con¬ 
sidered  individually.  For  example,  the  analysis 
could  be  performed  on  a  top-down  break¬ 
down  item  that  is  critical.  The  analysis  should 
be  performed  to  the  level  where  the  problem 
can  be  identified  and  corrective  action  can  be 
taken.  Modes  are  ranked  to  reflect  both  the 
probability  of  occurrence  and  the  severity  of 
the  failure  effect  relative  to  the  hardware’s 
mission  or  on  crew  safety.  Corrective  action 
should  consider  use  of  higher  reliability  parts, 
redundancy,  alternate  modes  of  operation, 
use  of  protective  devices,  improved  main¬ 
tenance  accessibility  or  as  candidates  for  de¬ 
velopmental  testing  and  reliability  improve¬ 
ment  programs.  All  of  this  information  is 
documented  by  completing  FMECA  forms. 

In  addition  to  the  expected  environments 
of  the  operating  mission,  a  FMECA  should 
also  take  into  account  failure  modes  and 
effects  associated  with  transportation  and 
storage  environments.  The  effects  of  these 
environments  are  of  particular  concern  foT 
hardware  that  cannot  be  tested  effectively 
(i.e.,  arming  and  fuzing  systems,  squibs). 
However,  even  for  hardware  that  can  be 
tested,  these  environments  affect  availability. 

b.  FMECA  Scheduling 

To  influence  design  decisions,  FMECA 
must  be  completed  and  results  available  when 
design  reviews  are  held.  Design  reviews  repre¬ 
sent  key  milestones  in  a  system  development 
program.  An  initial  FMECA  should  be  avail¬ 
able  to  support  preliminary  design  review 
(PDR)  and  an  updated  FMECA  should  be 
available  for  critical  design  review  (CDR). 

c.  FMECA  Worksheet 

A  number  of  generally  similar  forms  are  in 
use  for  documenting  FMECA.  While  no  one 
form  is  applicable  to  all  programs,  it  is  usually 
easy  to  adapt  a  form  to  a  specific  program. 
Figure  4-27  presents  typical  FMECA  work¬ 
sheets. 

d.  FMECA  Procedure 

A  FMECA  is  completed  as  follows; 

Step  1.  Block  Diagram  -  A  functional 
block  diagram  may  be  prepared  to  describe 
relationships  among  elements  of  the  hardware 


at  a  particular  assembly  level.  For  example,  in 
a  diagram  of  a  subsystem,  each  block  repre¬ 
sents  an  equipment;  in  an  equipment  diagram 
each  block  represents  a  component.  The  di¬ 
agram  should  make  clear  the  functional  rela¬ 
tionships  of  each  block  to  the  others;  the 
nature  and  magnitude  of  inputs  and  outputs 
should  also  be  labeled.  Each  block  may  be 
designated  by  an  item  number  for  use  in  com¬ 
pleting  the  FMECA  form.  Figure  4-26  is  an 
example  of  a  functional  block  diagram  at  sub¬ 
system  level.  It  partitions  hardware  for  anal¬ 
ysis  at  the  equipment  level.  Since  not  all 
hardware  fits  the  typical  part-component- 
equipment-subsystem-system  pattern,  cases 
may  be  encountered  where  the  system  must 
be  partitioned  arbitrarily.  In  those  cases  the 
hardware  should  be  grouped  for  analysis  in 
the  way  that  seems  simplest  and  most  logical. 
It  may  be  better  to  base  such  a  grouping  on 
functional  rather  than  placement  or  packaging 
considerations. 

Step  2.  Failure  Modes  -  Each  block  of  the 
block  diagram  is  considered  in  succession.  All 
credible  failure  modes  are  listed,  both  degrad- 
ative  and  catastrophic.  It  is  important  to  list 
not  only  what  is  expected  to  happen  but 
rather  every  failure  mode  than  can  happen. 
This  usually  requires  some  consolidation  of 
simple  failure  events,  particularly  when  con¬ 
sidering  higher  levels  of  assembly  such  as 
equipments  or  subsystems.  For  example,  all 
of  the  numerous  failures  that  can  affect  an 
amplifier  in  continuous  use  can  be  sum¬ 
marized  in  thTee  failure  modes-no  output, 
gain  out  of  tolerance,  noise  or  distortion  of 
the  amplified  signal.  Similarly,  for  many 
switches,  all  failure  possibilities  can  be  sum¬ 
marized  by  considering  that  the  switches 
may  fail  open  when  they  should  remain 
closed,  may  close  (short)  when  they  should 
be  open,  or  may  have  excessively  high  cir¬ 
cuit  resistance.  In  reviewing  the  ways  hard¬ 
ware  can  fail,  it  is  important  to  assess  possible 
effects  of  environmental  stresses  as  well  as 
operating  stresses.  For  this  reason  it  is  impor¬ 
tant  that  the  best  available  environmental 
envelope  be  prepared  during  mission  analysis 
before  beginning  FMECA. 

Step  3.  Causes  of  Failure-  -  Beside  each 
failure  mode  are  listed  all  the  causes  believed 
capable  of  giving  rise  to  the  failure.  This  step 
may  also  call  for  summarizing  some  of  the 
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Figure  4-27A.  Typical  FMECA  Worksheet 
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Figure  4-27B.  Typical  FMECA  Worksheet 
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with  the  failure  mode  and  determines  its 
effect,  whereas  FTA  starts  with  a  specific 
undesired  event  and  determines  the  potential 
causes  using  combinatorial  logic.  FTA  is  a 
valuable  tool  for  summarizing  the  results  of 
a  FMFXA  and  is  also  useful  for  preparing 
trouble  shooting  manuals. 

4. 2.6.1  FTA  Method 

FTA  involves  the  following  procedure: 

a.  System  Definition 

Define  (1)  performance  and  safety  of  the 
system,  (2)  relationships  between  system  per¬ 
formance  of  lower  level  assemblies,  (3) 
human,  hardware,  and  software  interfaces, 
and  (4)  operation  of  the  system. 

b.  Mission  Definition 

Define  (1)  mission  phases,  (2)  environmen¬ 
tal  profile,  (3)  duty  cycle  profile,  and  (4) 
success  criteria. 


c.  Block  Diagram i 

Block  diagrams  depict  system  functional 
flow  at  the  circuit 'component  level  Operator 
interactions  and  external  interfaces  are  also 
included  in  the  diagrams. 

d-  Fault  Tree  Construction 

Fault  tree  construction  begins  with  the 
identification  of  the  undesired  event  that  will 
form  the  top  level  of  the  tree.  This  is  nor¬ 
mally  a  high  level  event  involving  non¬ 
performance  of  a  required  function  at  the  sys¬ 
tem  level,  although  top  events  are  sometimes 
stated  in  non-hardware  terms  such  as  occur¬ 
rence  of  personnel  injury.  For  example,  in  the 
case  of  a  nuclear  power  station,  the  top  event 
in  a  reactor  control  system  tree  might  be  core 
melt  while  the  top  event  in  the  safety  system 
tree  could  be  the  release  of  a  specified  quan¬ 
tity  of  radiation. 

The  fault  tree  is  developed  using  logic  sym¬ 
bols  (Figure  4-28)  to  trace  downward  from 
the  top  event,  through  all  levels  of  sub-events, 
to  all  of  the  elementary  events  which  contrib¬ 
ute  to  causing  the  top  event.  The  level  of 


An  event,  usually  a  fault,  resulting 
from  the  combination  of  one  or  more 
basic  faults. 


A  basic  fault,  usually  at  the  com¬ 
ponent  level,  which  can  be  established 
from  test  or  failure  mode  analysis. 


OA  fault  not  developed  further  as  to  its 
causes  due  in  lack  of  intunnaiion, 
time  or  value  in  doing  so. 


PRIOR 

I 

A  B 


“Priority  Gate”  -  the  output  event  for 
the  “And  Gate"  can  only  occur  when 
the  stipulated  sequence  of  input  events 
occurs 


"Or  Gate"  -  the  output  event  occurs  when 
one  or  more  of  the  input  events  occurs. 


CD 


A  conditional  event  which  must  occur 
in  order  for  an  input  fault  (cause)  to 
result  in  an  output  fault  (effect) 


An  event  expected  looccui  in  normal 
operation. 


A  B 


“And  Gate"  the  output  event  o- curs 
only  when  all  the  input  events  occur 


“Exclusive  or  Gate"  -  the  output  event 
occurs  only  when  the  input  events  do 
not  coexist 


“Inhibit  Gate"  the  output  event  occurs 
only  when  the  condition  exists  The  inhibit 
condition  may  be  either  normal  operation  or 
the  result  of  another  fault. 


Reference  key  to  another  part  of  the  fault 
tree  where  the  identical  sequence  of  events 
appears 


Figure  4-28.  Logic  S>  mbols,  Used  in  Fauli  Tree  Diagrams 
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possible  causes  in  categories.  For  example,  it 
might  not  be  possible  to  list  every  circuit  con¬ 
dition  that  could  lead  to  loss  of  output  from 
an  amplifier,  but  they  might  be  summarized 
as  “circuit  failures’"  when  analyzing  the  sub¬ 
system.  Later,  each  would  be  reviewed  as 
analysis  proceeded  down  to  the  amplifier 
itself  and  the  circuits  within  it. 

Step  4.  Method  of  Detection  Provisions 
for  detecting  the  failure  mode  when  it  occurs 
are  listed.  Some  failure  modes  may  cause 
automatic  shut  down  to  prevent  additional 
damage,  others  may  result  in  audible  or  visual 
alarms,  and  still  others  may  go  ;  ^detected. 

Step  5.  Effect  on  function  or  Item  Per¬ 
formance  Tlie  immediate  result  of  each 
assumed  failure  at  the  next  higher  level  M 
assembly  is  described.  It  is  important  to  list 
the  local  effects  of  the  failure  rather  than  to 
make  a  judgment  of  the  overall  significance  of 
the  failure  to  the  system’s  performance. 

Step  6.  Effect  on  System  Performance  - 
The  result  of  each  assumed  failure  mode  on 
the  system  or  highest  assembly  level  item 
being  developed  is  described. 

Step  7.  Failure  Classification  -  Failure 
modes  may  be  classified  by  the  approximate 
degree  of  degradation  resulting  from  each 
mode.  A  code  such  as  that  shown  below  is 
usually  adequate. 

Catastrophic  Failure  that  will  create  a 
safety  hazard  (death  or  injury),  or  significant 
system  loss. 

Critical  -  Failure  that  will  degrade  the  sys¬ 
tem  beyond  acceptable  limits. 

Major  -  Failure  that  will  degrade  the  sys¬ 
tem  beyond  acceptable  limits  but  can  be 
adequately  controlled  or  countered  by 
alternate  means. 

Minor  -  Failure  that  does  not  degrade 
overall  system  performance  beyond  accept¬ 
able  limits. 

Each  contractor  must  define  these  terms 
spa  tficalh  for  his  subsystem. 

Step  8.  Failure  Probability  The  fre¬ 
quency  of  each  failure  mode  or  the  probabil¬ 
ity  of  the  mode,  can  be  estimated  using 
methods  such  as  those  given  in  M1L-HDBK- 
217[2J.  Often  a  simple  scoring  scheme  is  used 
to  group  modes  by  relative  probability. 

Step  9.  Compensating  Provisions/Condi¬ 
tions  Compensating  provisions  embodied 
in  the  design,  such  as  redundant  channels. 


higher  reliability  parts,  should  be  listed  lor 
each  significant  failure  mode.  It  should  be 
stated  whether  the  compensation  is  total  or 
partial,  and  whether  resorting  to  the  com¬ 
pensating  provisions  will  hnut  or  reduce  ef¬ 
ficiency. 

Step  10.  Comments  and  or  Recommenda¬ 
tions  for  Design  Improvement  Add  Puma! 
provisions  that  might  feasibly  be  included  ,n 
the  design  should  be  listed  '.ddttional  red  lin¬ 
dane;.  and  alternate  modes  of  operation  are 
examples  jf  compensatory  provwi<v  that 
might  he  indicated  where  a  failure  ;>. 

critical  to  the  mission  Where  a  v.tiule  par 
failure  mode  is  identified  as  ,  a'astrephw  or 
critical  to  the  mission,  one  or  mote  rewov, 
mediations  tor  improving  the  reliability  •. : 
the  design  should  always  be  made. 

Step  1 1 .  Closing  the  Loop  For  a  LM!  (  A 
to  be  effective,  it  is  important  that  respon¬ 
sible  engineering  management  be  an  integral 
part  of  the  FMECA  process.  Recommenda¬ 
tions  growing  out  of  the  analysis  should  be 
evaluated  by  management  for  feasibility  and 
cost  of  implementation.  Corrective  action 
decisions  resulting  from  this  evaluation  must 
then  be  followed  up  and  closed  out  by  man¬ 
agement,  dosing  the  loop  on  the  FMECA 
process. 

When  a  FMECA  is  reported,  the  results  of 
the  analysis  should  be  summarized  in  an 
executive  summary.  The  summary  should 
include  a  listing  of  the  important  failure 
modes  disclosed  in  the  analysis  and  recom¬ 
mendations  for  eliminating  them  cr  reducing 
their  impact.  Management  can  then  consider 
and  act  on  the  recommendations.  The 
FMECA  worksheets  should  be  included  in  the 
FMECA  report  for  review  by  interested 
parties  and  as  documentation  of  the  analysis 

4.2.6  Fault  Tree  Analysis 

Fault  Tree  Analysis  (FTA)  (161  is  a  design 
evaluation  procedure  which: 

a.  Identifies  undesired  events. 

b.  Graphically  traces,  from  each  undesired 
event  selected,  through  hardware  failures, 
software  and  human  errors  which  could  cause 
the  event. 

c.  Estimates  the  probability  of  undesired 

events. 

FTA  differs  from  FMECA.  FMECA  begins 
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with  the  failure  mode  and  determines  its 
effect,  whereas  FTA  starts  with  a  specific 
undesired  event  and  determines  the  potential 
causes  using  combinatorial  logic.  IH'A  is  a 
valuable  tool  for  summarizing  the  results  of 
a  FMECA  and  is  also  useful  for  preparing 
trouble  shooting  manuals. 

4.2.6. 1  FTA  Method 

FTA  involves  the  following  procedure: 

a.  System  Definition 

Define  (1)  performance  and  safety  of  the 
system,  (2)  relationships  between  system  per¬ 
formance  of  lower  level  assemblies,  (3) 
human,  hardware,  and  software  interfaces, 
and  (4)  operation  of  the  system. 

b.  Mission  Definition 

Define  (1)  mission  phases,  (2)  environmen¬ 
tal  profile,  (3)  duty  cycle  profile,  and  (4) 
success  criteria. 


c.  Block  Diagrams 

Block  diagrams  depict  system  functional 
How  at  the  circuit/component  level.  Operator' 
interactions  and  external  interfaces  are  also 
included  in  the  diagrams. 

d.  Fault  Tree  Construction 

Fault  tree  construction  begins  with  the 
identification  of  the  undesired  event  that  will 
form  the  top  level  of  the  tree.  This  is  nor¬ 
mally  a  high  level  event  involving  non- 
performance  of  a  required  function  at  the  sys¬ 
tem  level,  although  top  events  are  sometimes 
stated  in  non-hardware  terms  such  as  occur¬ 
rence  of  personnel  injury.  For  example,  in  the 
case  of  a  nuclear  power  station,  the  top  event 
in  a  reactor  control  system  tree  might  be  core 
melt  while  the  top  event  in  the  safety  system 
tree  could  be  the  release  of  a  specified  quan¬ 
tity  of  radiation. 

The  fault  tree  is  developed  using  logic  sym¬ 
bols  (Figure  4-28)  to  trace  downward  from 
the  top  event,  through  all  levels  of  sub-events, 
to  all  of  the  elementary  events  which  contrib¬ 
ute  to  causing  the  top  event.  The  level  of 


An  event ,  usually  a  fault,  resulting 
from  the  combination  of  one  or  more 
basic  faults. 


A  basic  fault,  usually  at  the  com¬ 
ponent  level,  which  can  be  established 
from  test  or  failure  mode  analysis. 


A  fault  not  developed  further  as  to  its 
causes  due  to  lack  of  information, 
time  ot  value  in  doing  so. 


CD 


A  conditional  event  which  must  occur 
in  order  for  an  input  fault  (cause)  to 
result  in  an  output  fault  (effect) 


An  event  expected  to  occur  in  normal 
operation. 


"And  Cate"  -  the  output  event  occurs 
only  when  all  the  input  events  occur. 


“Priority  Cate"  -  the  output  event  for 
the  “And  Cate"  can  only  occur  when 
the  stipulated  sequence  of  input  events 
occurs. 


“Or  Gate”  -  the  output  event  occurs  when 
one  or  more  of  the  input  events  occurs. 


“Exclusive  or  Gate"  -  the  output  event 
occurs  only  when  the  input  events  do 
not  coexist. 


"Inhibit  Cate"  -  the  output  event  occurs 
only  when  the  condition  exists.  The  inhibit 
condition  may  be  either  normal  operation  or 
the  result  of  another  fault. 


Reference  key  to  another  part  of  the  fault 
tree  where  the  identical  sequence  of  events 
appears. 


Figure  4-28.  Logic  Symbols,  Used  in  Fault  Tree  Diagrams 
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elementary  events  can  be  arbitrarily  defined 
for  any  fault  tree;  however,  it  usually  refers 
to  (1)  hardware  failures  of  basic  components 
such  as  those  found  in  M1L-HDBK-217[2] 
and  other  failure  rate  handbooks,  (2)  human 
actions,  (3)  software  errors  and  (4)  occur¬ 
rences  of  nature  such  as  fire,  water,  wind  and 
earthquakes. 

The  tree  is  constructed  by  linking  the  top 
event  to  its  immediate  causes  or  sub-events 
using  the  appropriate  gate  symbol  from  figure 
4-28,  then  linking  these  sub-events  to  their 
causes  and  continuing  in  tum  until  the  desired 
elementary  cause  level  is  reached.  In  the 
nuclear  power  example,  the  core  melt  tree 
might  look  like  figure  4-29. 

The  top  event  is  linked  to  its  immediate 
causes  by  an  AND  gate  since  core  melt  can 
occur  only  if  the  reactor  control  systems  and 
safety  systems  fail  simultaneously.  Developing 
the  safety  system  branch  of  the  tree,  the  next 
gate  would  also  be  an  AND  gate  for  the  typ¬ 
ical  redundant  safety  system;  however,  the 


first  gate  under  a  particular  safety  channel 
would  be  an  OR  since  any  of  the  events 
shown  could  disable  that  safety  channel. 

Fault  trees  are  subjected  to  both  qualitative 
analysis  and  quantitative  analysis.  Qualitative 
analysis  consists  of  determining  the  various 
combinations  of  elementary  events  that  will 
cause  the  top  event  to  occur  and  is  used  to: 
locate  single  point  failures,  assess  criticality 
of  components,  identify  common  mode  fail¬ 
ures,  evaluate  redundancy  and  determine  the 
relative  importance  of  general  fault  categories; 
i.e.,  hardware,  software,  human  enor  and 
nature.  Quantitative  analysis  consists  of  de- 
terming  the  probability  of  occurrence  of  the 
top  event  from  the  probabilities  of  occurrence 
of  the  elementary  input  events.  Quantitative 
analysis  is  always  preceded  by  qualitative 
analysis  since  the  quantification  methods 
given  here  are  valid  only  under  certain  condi¬ 
tions  and  those  conditions  are  insured  by 
performance  of  the  indicated  qualitative 
analysis. 


Figure  4-29.  Fault  Tree  -  Core  Melt 
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e.  Qualitative  Analysis 

For  a  simple  fault  tree  the  relationship  be¬ 
tween  top  event  occurrence  and  elementary 
event  occurrence  can  be  determined  by  in¬ 
spection.  In  the  tree  of  figure  4-30,  it  can  be 
seen  that  the  top  event  A  will  occur  in  either 
of  two  cases;  ( 1 )  event  D  occurs  or  (2)  events 
E  and  F  occur  together.  The  tree  can  be  re¬ 
drawn  as  in  figure  4-3 1  to  more  clearly  reflect 
the  true  relation  of  event  A  to  input  events  D, 
E  and  F. 


only  basic  events.  The  algorithm  is  based  on 
the  fact  that  an  AND  gate  increases  the  size 
of  a  cut  set  while  an  OR  gate  increases  the 
number  of  cut  sets,  hence  AND  gate  inputs 
are  listed  as  column  entries  in  a  single  row 
while  OR  gate  inputs  are  listed  as  row  entries 
in  a  single  column. 

To  illustrate  the  application  of  the  Fussell 
Vesely  algorithm,  consider  the  example  fault 
tree  in  figure  4-32.  The  top  event  gate,  G-O  is 
an  OR  gate  therefore  the  list  matrix  is  begun 
by  listing  its  inputs  on  separate  rows: 


Figure  4-30.  Fault  Tree  Example 


1 

G-l 

2 

Since  any  one  of  these  events  can  cause  the 
top  event  to  occur,  each  will  be  a  member  of 
a  separate  cut  set.  Since  G-l  is  an  OR  gate, 
it  is  replaced  by  its  inputs  listed  in  separate 
rows: 

1 

G-2 

G-3 

2 


Figure  4-31 .  Fault  Tree  Example  -  Redrawn 


In  a  more  complex  tree  it  is  quite  difficult 
and  sometimes  impossible  to  reduce  the  tree 
to  its  simplest  form  by  inspection.  The 
method  of  minimal  cut  sets,  described  in 
§  4. 2.2. 1.2.4,  may  be  applied  to  provide  a 
systematic  way  of  reducing  any  fault  tree  to 
a  form  which  is  free  of  repeated  inputs  and 
therefore  amenable  to  quantitative  analysis. 
The  idea  of  the  cut  set  algorithm,  first  stated 
by  Fussell  and  Vesely  1 1 71 ,  is  to  replace  each 
gate  by  its  inputs  of  gates  and  basic  events 
until  a  list  matrix  is  constructed  that  contains 


G-2  is  an  AND  gate,  hence  when  it  is  replaced 
its  inputs  are  entered  in  one  row  as: 

1 

G-4,  G-5 
G-3 
2 

Now  replacing  OR  gate  G-4  we  get : 

1 

4,  G-5 

5,  G-5 
G-3 

2 

Replacing  G-5  produces 

1 

4.6 

4.7 

5.6 

5.7 
G-3 

2 
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Elimination  of  G-3  creates  the  list 
1 

4.6 

4.7 

5.6 

5.7 
3 

G-6 

2 

The  final  list  is  obtained  by  replacing  G-6. 

1 

4.6 

4.7 

5.6 

5.7 
3 
6 
8 
2 

The  cut  sets  obtained  by  this  algorithm  are 
called  Boolean  Indicated  Cut  Sets  (B1CS) 
since  they  will  not  be  minima)  unless  there  are 
no  replications  of  basic  events.  When  basic 
events  are  repeated,  as  in  the  example,  then 
the  B1CS  list  generated  by  the  algorithm  must 
be  reduced  by  inspection.  In  the  list  of  BIOS 
for  the  example  6  is  a  cut  set  of  size  one, 
hence  all  larger  cut  sets  that  contain  6,  [4,6] 
and  [5,6]  are  deleted,  making  the  list  of  min¬ 
imal  cut  sets 

1 

2 

3 

6 

8 

4.7 

5.7 

Although  the  Fussell  Vesely  algorithm  can, 
in  theory,  be  applied  to  the  reduction  of  any 
fault  tree,  in  practice,  its  use  is  best  restricted 
to  fault  trees  of  modest  size.  Hand  application 
of  the  algorithm  to  large  trees  is  quite  tedious 
and  likely  to  result  in  errors.  Computer  pro¬ 
grams  to  implement  the  algorithm  are  avail¬ 
able,  [18,  19]  however,  other  computerized 
methods  (20,  21,  22,  23]  may  prove  more 
efficient  for  use  with  very  large  trees. 


Another  qualitative  procedure  that  may  be 
used  in  fault  tree  analysis  is  the  determination 
of  minima]  path  sets.' A  path  set  is  a  set  of 
basic  events  whose  non-occunence  insures 
the  non-occurrence  of  the  top  event.  A  path 
set  is  minimal  if  it  cannot  be  further  reduced 
and  still  remain  a  path  set.  Path  set  deter¬ 
mination  may  be  used  to  identify  areas  in 
which  redundancy  would  be  beneficial. 

The  first  step  in  finding  the  minimal  path 
sets  for  a  fault  tree  is  to  construct  the  dual 
(complement)  of  the  tree  by  replacing  OR 
gates  with  AND  gates  and  AND  gates  with  OR 
gates  in  the  original  tree  and  by  replacing  the 
occurrence  of  basic  events  with  the  non- 
occurrence  of  those  events.  The  minimal  cut 
sets  of  the  dual  tree  are  then  obtained,  these 
are  the  minimal  path  sets  of  the  original  tree. 

f.  Quantitative  Analysis 

After  a  fault  tree  has  been  constructed  and 
reduced  to  minimal  cut  sets,  the  probability 
of  occurrence  of  the  top  event  can  be  deter¬ 
mined  by  propagating  the  probabilities  of  oc¬ 
currence  of  the  basic  events  upward  through 
the  tree. 

The  probability  of  occurrence  of  the  out¬ 
put  event  of  an  AND  gate  is  found  by  multi¬ 
plying  the  probabilities  of  occurrence  of  the 
input  events.  The  probability  of  occurrence  of 
the  output  event  of  an  OR  gate  is  approxi¬ 
mately  equal  to  the  sum  of  the  probabilities 
of  occurrence  of  the  input  events.  The  exact 
probability  for  the  output  of  an  OR  gate  may 
be  found  from 

s  p,  -  i  n  pipi  + 

1-1  1*1  KJ  1 


which  for  two  inputs  is 

P  (O)  *  P  (A)  +  P(B)  -  P(A)  P(B) 
for  three  inputs  is 

P  (O)  *  (P(A)  +  P(B)  +  P(C)]  -  [P(A)  P(B) 
+  P(A)  P(C)  +  P(B)  P(C)J 
+  [P(A)P(B)  P(C)J 
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and  for  four  inputs  is 

P  (O)  -  IPCA)  ♦  P(B)  +  P(C)  +  P(D)] 

-IKA)  P(B)  +  P(A)  PCC) 

+  P(A)  P(D)  ♦  P(B)  P(C) 

+  P(B)P(D)  +  P(C)P(D)] 

+  (P(A)  P(B)  P(C) 

+  P(A)  P(B)  P(D) 

+  P(A)  P(C)  P(D) 

+  P(B)  P(C)  P(D)] 

-  IP(A)  P(B)  P(C)  P(D)1 

The  exact  method  is  difficult  to  employ, 
either  by  hand  or  by  computer,  for  large  num¬ 
ber  of  inputs  and  should  be  used  only  when 
the  approximation  is  not  sufficiently  ac¬ 
curate.  The  accuracy  of  the  approximation  is 
a  function  of  both  the  number  of  inputs  and 
the  magnitude  of  the  input  probabilities;  it 
decreases  with  increasing  number  of  inputs 
and  with  increasing  input  probabilities.  The 
approximate  value  will  always  be  larger  than 
the  true  value,  with  an  error  of  about  5%  for 
10  inputs  with  probabilities  IxlO*1 .  The  error 
percentage  will  change  by  one  order  of  mag¬ 
nitude  for  each  order  of  magnitude  change  in 
either  the  number  of  inputs  or  the  input 
probabilities. 

It  must  be  remembered  that  the  AND  gate 
computational  procedure  and  the  exact 
method  for  OR  gates  both  require  that  all 
gate  inputs  be  independent.  This  condition  is 
insured  throughout  the  tree  if  either  there  are 
no  repeated  basic  inputs  or  if  the  replications 
are  eliminated  by  reduction  of  the  tree  to 
minimal  cut  sets.  To  illustrate  the  importance 
of  meeting  this  condition,  consider  the  fault 
tree  in  figure  4-30  and  let  P(D)  =  P(E)  =  P(F) 
*  lx  10  6 .  Then  P(B)  =  P(C)  -  2  x  10  4  and 
P(A)=  { P(B)J  IP(C)J  *  4  x  10  ”  with  the  cal¬ 
culation  of  P(A)  violating  the  condition  of 
independence,  since  D  is  an  ultimate  cause  of 
both  events  B  and  C.  P(A)  correctly  com¬ 
puted  from  the  minimal  cut  sets  of  Figure 
4-31,  is  1.000001  x  10-*.  The  error  encoun¬ 
tered  in  using  the  AND  gate  multiplicative 
role  incorrectly  will  depend  on  the  number  of 
replications  involved  as  well  as  the  value  of 
the  input  probabilities  involved.  OveT  the 
range  of  input  probabilities  typically  found  in 
fault  trees,  the  error  can  always  be  expected 
to  be  greater  than  50%. 


The  computation  of  top  event  probability 
is  made  easier  in  the  case  of  large  fault  trees 
by  use  of  one  of  several  available  computer 
programs  [  1 9,  2 1 , 22, 23 J . 


4.2.7  Comparison  of  FMECA  and  FTA 
Methods 

The  FMECA  analyst  determines  the  assem¬ 
bly  level  at  which  the  FMECA  design  analysis 
is  to  begin.  Failure  modes  are  postulated  for 
each  element  at  that  level  and  the  effects  of 
the  failure  modes  are  traced  upward.  The 
analysis  is  inductive;  reasoning  is  from  the 
particular  to  the  general.  There  is  little 
chance  for  omission  because  each  failure 
mode  of  each  hardware  element  is  examined 
in  turn. 

A  principal  advantage  of  the  FMECA  is  its 
procedural  simplicity.  The  analytical  process 
is  straightforward  and  permits  complete  and 
orderly  evaluation  of  a  design. 

Disadvantages  of  FMECA  are  that  the 
method  considers  only  single  failures  and 
cannot  readily  examine  the  effects  of  human 
errors  or  other  factors  external  to  the  system. 
Not  does  it  lend  itself  to  assessing  secondary 
effects  of  a  failure.  For  example,  a  power  sup¬ 
ply  may  become  overloaded  due  to  a  short 
circuited  crystal;  the  overload,  in  turn,  may 
result  in  a  reduction  of  output  from  the 
power  supply,  causing  other  system  effects 
which  may  not  be  readily  discerned  from  the 
FMECA  process.  Multiple  hardware  failures 
or  human  errors  can  also  result  in  conse¬ 
quences  difficult  to  identify  by  FMECA. 

Reference  is  sometimes  made  to  “top 
down”  FMECA,  and  this  is  often  confused 
with  fault  tree  analysis.  In  a  top  down 
FMECA,  modes  of  failure  of  an  assembly  are 
postulated  and  their  effects  on  personnel  and 
equipment  mission  are  determined.  The  fail¬ 
ure  modes  are  then  traced  back  through  the 
assembly  to  determine  their  causes. 

Fault  tree  analysis  begins  with  selection  of 
an  undesired  event.  The  analyst  then  works 
downward  to  identify  possible  hardware  mal¬ 
functions  and  human  or  software  errors  that 
lead  to  the  undesired  event.  The  analysis  is 
carried  down  to  a  hardware  or  operating  level 
where  failure  rate  data  are  available  or  can  be 
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developed.  The  method  is  deductive;  reason¬ 
ing  is  from  the  general  to  the  particular.  Ad¬ 
vances  of  the  FT  A  are  its  ability  to  encom¬ 
pass  external  factors  such  as  human  error, 
sabotage  or  natural  disasters  within  the 
formalism  of  the  analysis.  Fault  trees  also 
tend  to  be  easy  to  read  and  interpret.  A  major 
disadvantage  of  the  FT  A  method  is  that  there 
is  no  way  to  be  sure  every  fault  path  has  been 
included  in  the  analysis.  Examples  of  fault 
tree  development  are  illustrated  in  figures 
4-33  and  4-34. 

It  can  readily  be  seen  that  FTA  conve¬ 
niently  handles  the  machine-operator  inter¬ 
faces.  However,  it  is  not  easy  to  be  sure  that 
all  possible  causes  of  the  top  event  have  been 


considered .  And  it  is  necessary  to  draw  a  fault 
tree  for  each  top  event  of  interest.  It  is  not 
usually  convenient  to  model  a  combination  of 
undesired  events  in  a  single  fault  tree  because 
the  same  item  may  appear  at  several  points  in 
the  tree. 

A  fault  tree  has  direct  visual  impact  and  for 
that  reason  is  often  useful  for  summarizing  a 
FMECA.  Thus,  a  FTA  is  often  provided  in  the 
management  summary  of  a  FMECA.  A  fault 
tree  can  aid  in  the  development  of  repair  and 
test  manuals  by  providing  a  graphic  means  for 
tracing  from  a  system  fault  to  the  associated 
hardware  failure(s). 

Figure  4-3  5  compares  the  FMECA  and 
FTA  methods. 


FUEL  SYSTEM 
MALFUNCTION 


ONOAAY 

IGNITION 

FAILURE 


NO 

ELECTRICAL 


Figure  4-33.  Example  of  Fault  Tree  Development 
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Figure  4-34.  Portion  of  RB  Deployment  Fault  Tree 
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4.3  ANALYSIS  OF  INTEGRATED  TEST 
PROGRAM  FOR  RELIABILITY  AND 
AVAILABILITY  EVALUATION 

The  general  practice  on  FBMWS/SWS  pro¬ 
grams  has  been  to  minimize  the  planning  of 
tests  specifically  for  reliability  or  availability 
demonstration,  while  making  maximum  use 
of  data  evolved  from  tests  performed  for 
other  reasons.  The  integrated  test  program 
approach  reflects  this  philosophy.  NAVORD 
OD  42282124],  Integrated  Test  Program 
Manual,  describes  the  approach  in  detail. 
When  this  approach,  an  efficient  method  for 
major  weapon  systems,  is  used,  the  evaluation 
task  is  to  analyze  the  integrated  test  program. 
There  are  two  major  sub-tasks-test  identifica¬ 
tion  and  test  evaluation. 

4.3.1  Test  Identification 

Throughout  the  development  and  pilot 
production  phases  of  a  program,  the  con¬ 
tractor’s  test  program  must  provide  the  data 


for  reliability  and  availability  evaluation.  An 
Integrated  Test  Program  Plan  (ITPP)  reflect¬ 
ing  planning  and  control  of  all  testing  activ¬ 
ities  is  required  by  SSPO.  The  ITPP  sets  forth 
the  purposes  and  extent  of  the  test  program 
and  is  an  essential  input  to  planning  for  reli¬ 
ability  and  availability  evaluation.  The  value 
of  planned  tests  for  reliability  and  availability 
measurement  can  be  estimated  by  analysis  of 
the  ITPP,  which  can  then  be  expanded  or 
amended  as  program  requirements  may 
dictate.  Specific  tests  that  will  contribute  data 
for  reliability  and  availability  measurement 
are  first  identified.  This  is  accomplished  by 
completing  a  form  such  as  the  Test  Identifica¬ 
tion  Form  shown  in  figure  4-36. 

To  complete  the  analysis,  it  is  necessary  to 
resolve  any  questions  that  may  arise  as  to 
whether  tests  of  individual  equipments  or 
other  portions  of  the  system  will  be  employed 
for  reliability  measurement,  as  well  as  tests  of 
the  full  system  configuration.  In  general,  tests 
of  hardware  at  lower  assembly  levels  can  be 


Test 

Idem. 

No. 

Type  of 
Tmi 

Ixve!  of 
Test 

Purpose 
of  Test 

Hardware 

Involved 

Test 

Duration 

Cycles/ 

Operating 

Time 

Pau/Fai! 

Criteria 

Instrumentation 

Requirements 

Data  to  be 
used  for 
Reliability. 
Availability 
Evaluation 

Figure  4-34.  Reliability/Availability  Test  Identification  Form 


4-46 


NAVSEA  OD  29304B 


accepted  as  contributing  to  the  evaluation 
data  base,  whenever  the  criteria  of  substantial 
mission  equivalence  can  be  satisfied  with 
respect  to  operating  environment,  functional 
use  and  maintenance  conditions.  With  these 
criteria  as  guidance,  analysis  of  planned  tests 
can  readily  be  made. 

The  first  step  in  the  analysis  is  to  evaluate 
each  planned  test  to  determine  its  purpose, 
the  hardware  involved,  and  the  estimated 
duration  of  the  test.  This  is  done  by  filling 
out  the  test  identification  form.  A  decision 
whether  to  use  the  data  for  reliability  and 
availability  measurement  is  based  on  this 
evaluation.  If  the  data  are  to  be  used,  then 
specific  pass/fail  criteria  and  cycle/operating 
time  designation  for  each  portion  of  the  test 
should  be  stated  and  any  special  instrumenta¬ 
tion  requirements  indicated.  The  test  identifi¬ 
cation  form  contains  the  following  fields: 

Test  Identification  Number  -  a  number 
assigned  to  fully  identify  the  test. 

Type  of  Test  —  List  the  type  of  test  such  as 
Development,  Engineering  Evaluation,  or 
Qualification. 

Level  of  Test  —  indicate  whether  the  test 
is  at  system,  subsystem,  equipment,  or  com¬ 
ponent  level. 

Purpose  of  Test  -  summarize  the  purpose 
of  the  test  and  include  a  reference  to  the 
particular  paragraph  of  the  test  plan  that 
describes  the  test  in  detail. 

Hardware  Involved  -  list  the  component 
breakdown  for  equipment,  subsystem  and 
system  tests.  Where  a  standard  configuration 
is  involved,  this  can  be  a  reference  to  a  stan¬ 
dard  list. 

Test  Duration  -  estimate  the  amount  of 
operating  or  environmental  exposure  time 
that  will  be  accumulated  during  the  test.  Pro¬ 
vide  separate  estimates  for  each  component 
if  this  is  necessary  for  system,  subsystem  or 
equipment  tests. 

Cycle/Operating  Time  -  define  whether 
the  test  results  are  to  be  reported  as  cycles 
and/or  operating  time.  This  decision  is  made 
from  review  of  the  mission  profile. 

Pass/Fail  Criteria  -  define  a  specific  criteria 
for  determining  whether  the  test  should  be 
considered  a  success  or  failure.  This  might  be 
a  time  or  environmental  level,  threshold  or 
specific  readings  for  particular  performance 
parameters. 


Instrumentation  Requirements  -  indicate 
the  basis  for  measuring  operating  time  or 
cycles,  such  as  the  time  the  power  supply  is 
activated  or  the  number  of  times  a  particular 
switch  is  actuated. 

Data  to  be  used  for  Reliability/Availability 
Evaluation  -  indicate,  by  inserting  a  “yes”  or 
“no”  in  this  column;  whether  the  test  data 
will  be  used  for  reliability  and  availability 
measurement  purpose. 

4.3.2  Test  Evaluation 

The  expected  contribution  of  the  tests 
selected  for  reliability  and  availability  evalua¬ 
tion  fe  estimated  by  completing  a  form  such 
as  the  Test  Evaluation  Form  shown  in  figure 
4-37. 

A  review  of  the  tests  selected  for  use  in 
reliability  and  availability  evaluation  should 
assure  that  adequate  sample  sizes  are  pro¬ 
vided;  that  is,  the  data  used  for  evaluation  is 
not  limited  to  repeated  testing  of  1  or  2 
units. 

From  the  test  evaluation  form,  a  summa¬ 
tion  is  made  of  the  estimated  test  times  for 
a  component  in  each  level  of  test -com¬ 
ponent,  equipment,  subsystem,  and  system. 
This  result  is  converted  into  equivalent  mis¬ 
sions  through  multiplication  by  the  com¬ 
ponent  alpha  value  obtained  from  the  mission 
profile.  A  comparison  of  the  estimated  num¬ 
ber  of  equivalent  missions  with  the  number 
necessary  to  demonstrate  the  apportioned 
reliability/availability  at  the  desired  con¬ 
fidence  level  will  indicate  whether  the  test 
program  for  that  component  is  adequate. 
This  evaluation  should  be  done  for  each  sig¬ 
nificant  mission  environment  to  which  the 
component  will  be  exposed.  Each  component 
and  equipment  group  should  be  evaluated  in 
turn. 

In  the  exponential  case,  an  estimate  of  the 
reliability  lower  bound  which  the  planned 
testing  would  produce  is  obtained  by  multi¬ 
plying  the  predicted  failure  rate  (failures  per 
mission)  by  the  estimated  equivalent  missions 
to  be  produced  by  the  planned  testing.  This 
product  is  the  expected  number  of  failures. 
Using  this  value  of  the  expected  number  of 
failures  and  the  estimated  equivalent  missions 
enter  tables  such  as  NAVWEPS  OD  30668 
(25)  to  determine  the  lower  bound  the 


4-47 


NAVSEA  OD  29304B 


m 

Hardware 

Test 

Estimated  Test  Time 

Alpha 

Estimated 

Equivalent 

Missions 

Estimated 

Name 

Environment 

Subsystem 

Equipment 

Component 

Total 

Valve 

Rl 

1 

Figure  4-37.  Reliability/Availability  Test  Evaluation  Form 


planned  data  is  expected  to  produce.  If  this 
is  too  far  below  the  predicted  value  considera¬ 
tion  can  be  given  to  planning  additional 
tests. 

Example 

Pred'  -d  Failure  Rate  =  .006  failures/ 
mission 

Estimated  Equivalent  Missions  =  1 ,000  mis¬ 
sion 

The  product  (.006)  (1 ,000)  =  6  failures 

Selecting  the  eighty  percent  confidence 
level  from  tables  (25]  or  Appendix  E  figure 
E-2. 

Rl  =0.9910 

which  can  be  compared  with  R'  =  0.9940,  the 
predicted  value. 

Similar  procedures  can  be  used  for  other 
distributional  forms. 

The  test  evaluation  form,  figure  4-37  con¬ 
tains  the  following  fields: 

Test  Identification  Number  -  a  number 
assigned  to  fully  identify  the  test. 

Hardware  Name  -  state  the  equipment 
being  evaluated.  Evaluate  the  equipment  first, 


then  the  components  within  it,  then  the  next 
equipment  and  its  components,  etc. 

Test  Environment  -  list  the  mission  envi¬ 
ronments  to  be  evaluated  for  each  hardware 
element. 

Estimated  Test  Time  -  list  the  estimated 
test  time  totals  for  each  level  of  test -system, 
subsystem,  equipment,  or  component  and 
also  the  sum. 

Alpha  Value  —  transfer  the  appropriate 
alpha  value  for  the  environment  from  the 
mission  profile. 

Estimated  Equivalent  Missions  -  this  is  the 
product  of  the  alpha  value  and  the  sum  of  the 
estimated  test  time/environment. 

The  number  of  equivalent  missions  should 
be  approximately  equal  for  a  balanced  test 
program.  Since  the  ITP  approach  makes  use 
of  all  applicable  test  data  including  tests  not 
planned  specifically  for  reliability/availability 
evaluation,  it  is  possible  for  imbalance  to 
occur.  The  analyst  should  understand  the 
reasons  for  an  unbalanced  program  when  it 
occurs. 

Estimated  RL  -  this  column  provides  the 
estimated  lower  bound  on  reliability  at  the 
desired  confidence  level  which  the  test  pro¬ 
gram  is  expected  to  produce. 
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Evaluation  of  the  integrated  test  program 
may  indicate  an  unbalanced  plan  (i.e.,  widely 
varying  values  of  estimated  equ  valent  mission 
at  the  same  hardware  level)  or  insufficient 
testing  (i.e.,  a  value  of  RL  far  below  the 
predicted  value  R'). 

Analysis  of  the  reasons  for  an  unbalanced 
program  or  insufficient  testing  of  some  items 
often  leads  to  recommendations  for  im¬ 
proving  the  ITPP. 

4.3.3  Data  Classification 

Test  data  (time  and  failure  information) 
from  tests  considered  non-relevant  for  reliabil¬ 
ity  purposes  are  excluded  for  purposes  of 
evaluation.  The  failures  should  be  analyzed 
and  reported  in  the  failure  summary  report 
and  should  be  classified  non-relevant  since 
they  come  from  a  non-relevant  test. 

Test  data  from  tests  considered  relevant  for 
reliability  purposes  can  be  used  for  evalua¬ 
tion.  Rules  for  establishing  the  relevance  of 
failures  in  these  tests  should  be  carefully 
established.  It  is  often  desirable  but  not 
always  accurate  to  eliminate  (consider  non- 
relevant)  failures  due  to  human  error,  test 
equipment  error,  and  similar  causes  even 
when  the  failure  occurs  in  relevant  test  pro¬ 
grams.  The  contractor  should  establish,  in 
his  reliability  evaluation  plan  the  rules  to  be 
used  for  this  purpose. 

Elimination  of  relevant  failures  may  be 
desirable  after  corrective  action  has  been  in¬ 
corporated  to  eliminate  a  failure  mode. 
(Note:  this  should  not  be  permitted  when 
reliability  growth  models  are  being  employed, 
as  the  growth  model  requires  these  data.) 
The  contractor  should  establish  criteria  for 
the  amount  of  failure-free  tests  data  required 
on  the  new  design  before  the  relevant  failures 
can  be  eliminated.  Provisions  for  re-inserting 
all  failures  removed  must  be  available  if  the 
failure  mode  recurs.  The  contractor  should 
also  establish  the  policy  for  using  the  test 
time  when  failures  are  removed. 

All  failures  should  be  reported  in  the  fail¬ 
ure  summary  report.  The  failure  classification, 
non-relevant,  relevant,  or  non-relevant  pre¬ 
viously  classified  relevant,  defines  failures  to 
be  used  in  reliability/availability  calculations. 
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Section  5 

ASSESSMENT  OF  COMPONENT  RELIABILITY 


This  section  deals  with  techniques  for 
assessing  the  reliability  of  components,  based 
on  test  data-  generated  in  component  level 
tests.  It  identifies  those  failure  models  and 
methods  most  effective  in  reliability  assess¬ 
ments  and  provides  guidelines  for  their  selec¬ 
tion  and  application.  The  section  provides  a 
road  map  (Figure  5-4)  approach  for  selection 
of  the  model  most  appropriate  for  the  data 
being  assessed.  Assessment  consists  of  obtain¬ 
ing  point,  and  interval  estimates  of  reliability, 
failure  rate,  and  MTBF  from  component  test 
data,  and  when  reliability  growth  is  present, 
of  obtaining  trend  lines  and  reliability  bounds 
reflecting  the  uncertainty  in  trend  lines  as  a 
function  of  calendar  time. 

Software  reliability  assessment  is  consid¬ 
ered  in  section  6  which  contains  all  informa¬ 
tion  on  software,  and  component  availability 
is  treated  in  section  7,  the  system  assessment 
section,  since  availability  is  most  meaning¬ 
fully  stated  at  the  system  or  equipment  levels. 

A  number  of  statistical  terms  and  symbols 
used  in  this  section  are  defined  in  more  detail 
in  figure  5-1  than  was  presented  in  the  glos¬ 
sary. 

5.1  OVERVIEW  OF  THE  ASSESSMENT 
PROBLEM 

Reliability  information  consists  generally  in 
a  set  of  times  to  failure,  cycles  to  failure, 
stress/strength  parameter  performance  and 
pass/fail  data  for  one  or  several  items  on  test. 
Using  these  data,  point,  interval,  or  trend 
estimates  of  reliability  parameters  or  measures 
such  as  X,  0,  or  R  are  made. 

There  are  many  types  of  components, 
types  of  tests  and  ways  of  testing  that  provide 
data  for  reliability  assessment.  There  are 
many  probabilistic  models  of  failure  used 
in  reliability  assessment,  such  as,  the 


exponential,  binomial,  normal,  lognormal, 
Weibull,  gamma,  beta,  extreme-value,  cited 
or  referenced  in  this  section.  The  most  appli¬ 
cable  model  for  a  given  situation  is  some¬ 
times  known  or  assumed  in  advance  of  testing 
from  an  engineering  analysis  of  the  failure/ 
repair  process  or  from  failure  data/model 
analysis  for  similar  components  previously 
tested.  It  is  always  preferable  to  verify  any 
assumed  failure  model  by  a  goodness-of-fit 
test  of  the  new  failure  data  to  the  model 
when  sufficient  data  have  been  accumulated. 

In  development  programs  there  is,  addition¬ 
ally,  the  possibility  of  reliability  growth 
with  its  measures  characterized  by  trends  and 
testing  schedules  dependent  on  calendar 
time. 

Given  these  hardware  and  testing  consid¬ 
erations,  statistical  methods  are  also  diverse, 
featuring:  the  classical  approach  with  deci¬ 
sions  and  numerical  results  dependent  re¬ 
spectively  on  tests  of  hypothesis  and  on 
point  and  interval  estimates  inferred  strictly 
from  the  test  data  at  hand;  the  variables  ap¬ 
proach  which  replaces  the  classical  pass/fail 
concept  with  the  concept  of  critical  contin¬ 
uous  parameters,  which  are  often  normally 
distributed  and  which  are  either  “within 
specs”  or  “out  of  specs”;  and  the  Bayesian 
approach  which  allows  the  inclusion  of  prior 
information  in  specified  dosage  of  strength 
by  means  of  “prior”  distributions  and  param¬ 
eters,  and  the  update  of  this  information  with 
current  test  data  in  the  form  of  “posterior” 
distributions  and  parameters. 

The  “real  world”  of  hardware  and  testing 
is  related  in  figure  5-2  to  the  probabilistic 
models  of  failure/repair  and  to  the  assessment 
methods  described  in  this  section.  The  first 
two  columns  of  figure  5-2  show,  respectively, 
the  categories  of  components  found  in 
complex  systems,  and  examples  of  such 
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components  found  in  the  Navy’s  inventory. 
The  third  and  fourth  columns  show,  respec¬ 
tively,  the  types  of  tests  and  success  criteria 
used  in  assessment.  Notice  that  some  of  the 
tests  such  as  integrity  or  performance  tests 
may  be  used  as  “piggyback"  tests  for  failure/ 
repair  data,  but  that  they  may  be  insufficient 
for  a  thorough  assessment  of  reliability  and 
availability  or  for  a  true  reliability  /availability 
demonstration  (see  Section  8)  if  they  do  not 
have  the  sample  sizes  or  the  homogeneity  of 
conditions  necessary  to  demonstrate  speci¬ 
fied  reliability /availability  values  at  specified 
confidence  levels.  The  data,  however,  may 
still  be  usable  for  a  preliminary  assessment. 
Column  five  identifies  the  primary  measures 
of  equipment  behavior  used  in  reliability 
assessments.  As  to  be  expected,  these  mea¬ 
sures  are  different  for  each  equipment  cate¬ 
gory.  Column  six  shows  the  most  applicable 
distributions  as  models  of  failure  by  equip¬ 
ment  category. 

In  this  section  only  the  exponential,  bi¬ 
nomial,  normal,  lognormal,  and  Weibull 
distributions  are  the  object  of  a  full  assess¬ 
ment  description.  Their  statistical  descrip¬ 
tions  are  given  in  §  5.2.  Tb  Kinomial  distrib¬ 
ution  is  used  mostly  for  c..i  from  pass/fail 
test,  and  for  time  truncated  cycles  to  failure 
tests,  the  exponential  models  failure/repair 
data  with  constant  failure/repair  rate  (CFR), 
the  normal  models  data  from  “aging”  com¬ 
ponents  with  increasing  failure  rate  (1FR)  and 
the  variables  approach  to  reliability  assess¬ 
ment,  the  lognormal  models  skewed  failure/ 
repair  data  with  increasing  failure /re  pair 
rates  (IFR),  and  the  Weibull  models  failure 
data  with  DFR,  CFR,  or  IFR  characteristics. 

Other  probability  distributions  used  in 
assessment,  such  as  the  Poisson,  the  gamma, 
the  beta,  the  hypergeometric,  the  inverted 
gamma  and  the  ubiquitous  t,  F,  and  XJ  are 
either  mentioned  or  used  with  appropriate 
references. 

Finally,  column  seven  of  figure  5-2  lists 
the  candidate  methods  for  reliability  assess¬ 
ment.  The  Rubinstein  method  mentioned  in 
this  column  is  applicable  to  exponential 
components  with  data  originating  from 


testing  one  component  at  a  time  in  mixed 
censoring  Life  Tests,  tests  which  are  quite 
general  since  they  are  inclusive  of  both  Type 
I  and  Type  11  tests  described  below.  The 
method  allows  assessment  of  reliability  for 
components  tested  for  different  failure  modes 
and  for  different  operating  conditions. 

5.1.1  Types  of  Tests  by  Method  of 
Implementation  or  Stoppage 

It  is  important  to  consider  the  various  ways 
tests  can  be  implemented,  and  test  data  ob¬ 
tained,  since  the  assessment  formulas  used 
in  the  remainder  of  this  section  are  critically 
dependent  on  the  manner  in  which  test  data 
have  been  obtained. 

(a)  In  Uncensored  Life  Tests,  which  are 
sometimes  costly  or  diffucult  to  schedule,  n 
identical  items  are  placed  on  test  and  are 
monitored  for  times  to  failure  (e.g.  in  min¬ 
utes)  or  for  cycles  to  failure  until  all  items 
have  failed. 

(b)  In  Type  I  Life  Censoring  Tests,  or 
Time  Truncated  Tests,  n  identical  items  are 
placed  on  test  for  a  predetermined  amount  of 
test  time  T  or  number  of  cycles,  and  the 
times  to  failure  (e.g.  in  minutes)  or  the  cycles 
to  failure  of  the  x  items  which  fail  are  re¬ 
corded,  unless  there  are  no  failures.  These 
tests  can  be  implemented  with  or  without 
replacement  of  a  failed  item. 

(c)  In  Type  II  Life  Censoring  Tests  or 
Tests  to  Failure,  n  identical  items  are  placed 
on  test  until  a  predetermined  number  of 
failures  (x)  occur.  The  times  or  cycles  to  fail¬ 
ure  of  the  x  items  are  recorded.  These  tests 
can  be  implemented  with  or  without  replace¬ 
ment  of  a  failed  item. 

(d)  In  One  Component  at  a  Time  Mixed 
Censoring  Life  Tests,  which  are  used  for 
bulky  or  expensive  components,  or  when 
only  a  single  testing  device  is  available,  test¬ 
ing  may  take  place  one  component  at  a  time. 
The  test  may  terminate  either  by  failure  or 
accumulation  of  planned  test  times.  Notice 
that  data  from  both  (b)  and  (c)  above  are 
consistent  with  this  manner  of  testing,  but 
that  the  reverse  is  not  true. 
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(e)  In  Stress /Strength  Tests,  the  test  is 
generally  independent  of  time.  The  data 
consists  of  applied  stresses  (e.g.,  in  pounds 
per  inch  (PS1)1  versus  the  stress  percentile, 
and  of  the  strength  (e.g.  in  PS1)  versus  the 
strength  percentile. 

(0  In  Life  Tests  for  Repairable  Items, 
where  items  may  be  bulky  and  expensive,  n 
items  are  often  monitored  through  long  per¬ 
iods  of  operating  life  where  their  times/ 
cycles  to  failure  and  times  to  repair  are  re¬ 
corded  ,  as  well  as  the  calendar  time  of  failure , 
repair,  overhaul  and  other  important  events. 

(g)  In  Pass /Fail  Tests,  N  times  out  of  a 
grand  total  of  M  items  (which  may  be  taken 
to  be  infinite  if  M  »  N)  are  tested.  The  re¬ 
sult  of  the  test  is  time  independent.  The  items 
are  classified  as  x  "defectives”,  and  s  =  N-x 
“non-defective”. 

Many  other  types  of  tests,  and  of  test  data 
exist,  but  only  the  situations  described  above 
will  be  considered  in  this  section. 

5.1.2  Influence  of  Equipment  Maturity  and 
Category  of  Life  Cycle  Testing  Upon 
Applicable  Reliability  Assessment 
Methodology  and  Failure  Models 

As  previously  discussed  in  §  5.1  and  docu¬ 
mented  in  figure  5-2,  the  most  applicable 
failure  models  and  reliability  assessment 
methods  are  functions  of  the  type  of  equip¬ 
ment  subjected  to  test.  While  this  is  true,  the 
most  applicable  models  and  methods  are 
also  influenced  significantly  by  the  maturity 
of  the  product  design  at  the  time  of  its  test 
and  the  kind  of  life  cycle  test  involved.  This 
dependence  is  described  in  figure  5-3. 

5. 1 .3  Quality  of  Point  and  Interval 
Estimators 

When  speaking  of  either  Classical,  Bayesian, 
or  Variables  methods  an  important  topic  is 
the  quality  of  the  point  and  interval  esti¬ 
mators  used  for  R  A  assessment.  There  are, 
for  instance.  Maximum  Likelihood  (ML), 
Minimum  Variance  Unbiased  (MVU),  Best 
Linear  Unbiased  (BLU),  Maximum  Infor¬ 
mation  (Ml),  Best  Asymptotic  Normal  (BAN) 
and  many  other  methods  (4,  9]  used  to 
formulate  estimators.  These  lead  to  esti¬ 
mators  endowed  with  lesser  or  greater 


amounts  of  desirable  qualities;  such  as,  un¬ 
biasedness,  consistency,  asymptotic  efficien¬ 
cy,  efficiency,  minimum  variance,  sufficiency, 
and  invariance  [3, 4,  1 1  ] . 

In  many  cases,  a  method  is  selected  only 
because  it  is  the  only  tractable  one,  that  is, 
it  gives  an  estimator  where  the  other  methods 
could  not  because  of  inherent  mathematical 
difficulties. 

The  quality  of  estimators  most  sought  out 
is  unbiasedness,  which  is  the  quality  that  the 
expected  value  of  the  estimate  of  a  param¬ 
eter  or  the  reliability  and  availability  mea¬ 
sure  is  equal  to  the  parameter  or  measure 
being  estimated. 

Another  valuable  quality  is  minimum 
variance  unbiasedness,  that  is,  the  property 
of  an  estimator  to  cluster  as  closely  as  pos¬ 
sible  about  the  true  value  of  the  quantity 
which  is  being  estimated. 

Also,  it  should  be  noted  that  while  some 
point  and  interval  estimation  methods  are 
computationally  straightforward,  others,  par¬ 
ticularly  interval  estimations  with  certain 
types  of  test  data,  are  state-of-the-art  or 
cannot  be  performed  exactly  without  exces¬ 
sive  computational  labor.  A  number  of  situ¬ 
ations  can  be  tackled  only  by  making  simpli¬ 
fying  assumptions,  by  settling  for  asymptotic 
results  rather  than  exact  ones,  by  using  num¬ 
erical  approximations,  or  by  Monte-Carlo 
simulation. 

In  many  cases,  however,  the  lack  of  quality 
or  quantity  of  test  data  (such  as  may  arise 
from  “piggyback”  tests)  does  not  warrant 
seeking  out  the  most  exacting  methods.  In 
these  cases  assumptions  of  exponentiality  or 
normality,  or  neglect  of  test  conditions  can 
be  entertained,  provided  that  the  results  ob¬ 
tained  with  such  assumptions  are  presented 
with  an  estimate  of  all  errors,  including  the 
assumptional  errors. 

Graphical  methods  should  not  be  neglected 
in  this  connection  and  must  not  be  under¬ 
rated.  Whenever  possible  graphical  methods 
should  be  backed  by  analytical  techniques 
(e.g.,  Goodness-of-Fit  Tests),  but  as  indicated 
in  1221  even  the  analytical  techniques  cannot 
distinguish  significantly  between  similarly 
shaped  Weibull,  Lognormal  or  normal  dis¬ 
tributions  with  fewer  than  40  samples. 
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5. 2  STATISTICAL  DISTRIBUTIONS  USED 
AS  MODELS  OF  FAILURE  (OR 
REPAIR) 

This  subsection  presents  in  pictorial  and 
narrative  form  the  statistical  properties  and 
applicability  of  some  common  distributions 
used  as  models  of  failure  (or  repair).  Distribu¬ 
tions  covered  and  their  order  of  coverage  are: 
§5.2.1  Exponential  Distribution 
§  5.2.2  Binomial  Distribution 
§  5.2.3  Normal  Distribution 
§  5.2.4  Lognormal  Distribution 
§  5.2.5  Weibull  Distribution 
In  addition  §  5.2.6  identifies  many  other 
distributions  that  have  found  application  to 
reliability  and  availability  modeling  in  special 
situations. 


5.2.1  The  Exponential  Distribution 

5.2.1. 1  Without  a  Warrantee  Period 


PDF: 

CDF 
R(t) 
h(t) 
MTTF: 
Variance: 


f(t)  = 


Xex«  =  (l/d)e  «/•  X,0>  0, 
t>0 


10  t<0 

F(t)  =  1  -e-*‘  =  1  -e«/«  t>0 
=  e-M  =  e  */»  t  >  0 

=  X  or  1 1$ 

-  1 /X  or  0 
=  1A2  orflJ 


Process: 

The  basic  failure  process  underlying  the 
exponential  is  the  Poisson  failure  process 
g(x,t)  =  (e~x*  (Xt)*)/x!,  where  g(x,t)  repre¬ 
sents  the  probability  that  exactly  x  failures 
will  occur  in  the  time  interval  from  0  to  t. 
Because  reliability  is  defined  as  the  proba¬ 
bility  of  no  failure  from  0  to  t,  g(0,t)  = 
R(t)  =  ex‘. 

The  assumptions  under  which  the  expres¬ 
sion  for  g(x,t)  is  derived  are: 

1)  Component  failure  occurs  when  a  ran¬ 
dom  external  disturbance  or  shock  induces  a 
component  failure. 

2)  The  number  of  shocks  during  any  inter¬ 
val  of  time  is  independent  of  the  number  of 
shocks  occurring  during  other  intervals  of 
time. 

3)  The  probability  of  exactly  one  shock 
in  a  given  interval  of  time  is  proportional  to 
the  length  of  time,  with  a  constant  of  pro¬ 
portionality  X. 

An  important  feature  of  the  exponential 
failure  model  is  that  the  constancy  of  the 
hazard  rate  X  implies  that  reliability  is  a  func¬ 
tion  of  time  but  not  of  the  age  of  a  compon¬ 
ent.  The  component  does  not  wear  out,  but 
fails  only  because  of  random  shocks,  i.e., 
momentary  concentrations  of  stress  in  ex¬ 
cess  of  strength.  There  is,  however,  another 
aspect  of  the  exponential  which  makes  it 
applicable  as  a  failure  model  even  when  the 
Poisson  assumptions  are  not  fulfilled.  It  has 
been  shown  that  if  a  component  is  made  of 
many  elements,  each  having  a  different  fail¬ 
ure  distribution,  even  a  failure  distribution 
exhibiting  wearout,  then  the  component  will 
tend  to  exhibit  asymptotically  a  constant 
hazard  rate  as  time  goes  on  [49] . 

Applicability  of  exponential  distribution 
(without  a  warrantee  period)  as  a  hardware 
failure  model: 

Applies  to  most  electronic  components 
and  complex  systems  (i.e..  Central  Navigation 
Computer,  Ship  Inertial  Navigation  System, 
Guidance  Systems,  Electrical  Interconnects, 
Power  Distribution.  Servo-Mechanisms,  etc. 
. . .).  Also  applicable  as  a  hardware  repair 
model.  Has  been  proposed  for  several  soft¬ 
ware  failure  models. 
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S.2.1.2  With  a  Warrantee  Period,  ij. 


l 


warrantee  period  t?  during  which  no  failure 
can  occur. 

5.2.2  The  Binomial  Distribution 


PDF:  f(t)  =  Xex<«-*»  X,T?>0,t>i? 

CDF:  F(t)  =  1 

R(t):  =  eMt-n> 

h(t):  =  X  for  t  >  t? 

MTTF:  usually  not  of  interest 

Variance:  —  1  /X2  fort>rj 

Applicability  of  the  exponential  distribution 
(with  a  warrantee  period)  as  a  hardware  fail¬ 
ure  model: 

Applies  to  the  same  components  and  sys¬ 
tems  shown  in  §  5.2. 1.1,  but  which  have  a 


CDF:  G(x)  =  i  (N)RNi(l-Ry 

i=0  V  1  ' 

Mean:  NR 

Variance:  NR(l-R)orNRQ  (Q=  1-R) 

Process: 

The  binomial  density  function  arises  from 
a  Bernoulli  process,  a  process  in  which  an 
event,  such  as  a  success,  can  occur  with  con¬ 
stant  probability  R,  or  a  complementary 
event,  such  as  a  failure,  can  occur  with  con¬ 
stant  probability  Q,  (Q  =  1-R).  In  each  trial 
under  these  conditions,  the  form  of  the  bi¬ 
nomial  PMF  shown  above,  with  R  repre¬ 
senting  the  probability  of  success  per  trial, 


x 
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yields  the  probability,  g(N-x),  that  exactly 
N-x  successes  are  obtained  in  N  trials.  Ex¬ 
amples  of  reliability  calculations  using  bi¬ 
nomial  tables  are  given  in  Appendix  E, 
§  E.1.3.  Figure  E-3  provides  Binomial  Tables 
(80%  confidence). 

Applicability  of  the  binomial  as  a  failure 
model: 

Applies  to  one-shot  devices,  failed  or  suc¬ 
cessful  items  in  sampled  lots.  One-shot  de¬ 
vices  commonly  found  in  FBMWS/SWS  in¬ 
clude:  igniters,  stage  separation  devices, 
energy  transfer  system  harness,  propulsion 
devices  such  as  rocket  motors,  launcher  de¬ 
vices  such  as  igniters  or  grain  (propellant), 
warhead  fuzing  devices. 

5.2.3  The  Normal  Distribution 


HH 


Process: 

The  failure  process  underlying  the  normal 
has  been  described  in  (9] .  It  arises  naturally 
when  a  component's  performance  depends  on 
a  critical  parameter  which,  because  of  manu¬ 
facturing  variability,  has  an  initial  value  or0 
that  is  normally  distributed.  Assume  also  that 
this  parameter  varies  during  operation  so  that 
its  value  o  =  u  (cr0 ,  t)  is  a  function  of  time,  the 
unit  fails  if  it  exceeds  a  value  a, .  Let  r  be 
the  instant  of  failure  such  that  u(t,  aQ)  = 
a,  or  r=^  (aQ ,  a).  It  can  be  shown  that  under 
these  conditions  r  is  also  distributed  normal¬ 
ly.  If  the  mean  life  of  the  item  is  denoted  by 
E{r|  =  p  and  a  is  the  standard  deviation  ofr, 
with  a  «  p,  then  the  normal  failure  PDF 
results. 

Applicability  of  the  normal  distribution: 

Critical  parameters  of  hardware  exhibiting 
symmetrical  variability,  for  example  when 
times  to  failure  are  normally  distributed 
about  some  mean  value,  p.  This  is  often  ex¬ 
pected  when  hardware  enters  its  wearout 
phase.  Variables  measurements  are  required. 

5.2.4  The  Lognormal  Distribution 


mm 


F(y)  =  <Hy) 


Oy/(2*)  / 


Mean:  p 

Variance:  a 1 


I 
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Note:  n  and  a  are  parameters  of  the  log¬ 
normal  distribution.  They  do  not  have 
the  usual  meaning  of  mean  and 
standard  deviation  as  for  the  normal 
distribution. 


l  ft  ( 

PDF:  f(t)  =  —  e  V  0  /  o,t>0 

otv2ir 

CDF:  F(t)  =  /f«)d| 

R(t):  1  -  F(t) 

MTTF:  exp  |(o2/2)+p)] 

Variance 

oft:  [exp  (o2+ 2<<)1  [exp  (o2)-lj 

Variance 
of  ln(t):  a2 

Mean  of 
ln(t):  n 


Process: 

The  lognormal  failure  process  is  explained 
[9]  for  a  progressive  fracture  failure  mechan¬ 
ism.  Let  X,  <  Xj  .  .  .  <  X„  be  a  sequence 
of  random  variables  that  denote  the  sizes  of 
a  fatigue  crack  at  successive  stages  of  its 
growth.  A  proportional  effect  model  is  as¬ 
sumed  for  the  growth  of  these  cracks  such 
that  the  crack  growth  at  stage  i,  AX,  =  X,  - 
X.  j ,  is  randomly  proportional  to  the  size  of 
the  crack,  X,., ,  and  that  the  item  fails  when 
the  crack  size  reaches  Xn.  When  AX,  -*•  0, 
as  n  becomes  large,  this  model  leads  to  the 
lognormal  failure  distribution. 

Applicability  of  the  lognormal  distribution  as 
a  failure  model: 

Applies  to  component  measurements  when 
the  distribution  of  these  measurements  are 
skewed  and  the  distribution  can  be  normal¬ 
ized  by  using  the  logarithms  of  each  measure¬ 
ment.  Used  for  assessing  the  reliability  of 
structural  components.  Also  used  as  a  repair 
model. 


S.2.S  The  WeibuD  Distribution 
(two  parameter) 


CDF:  F(t)  =  1  - 


R(t): 

h(t):  fit ''let 

MTTF:  or  (0+0/0) 

Variance:  a2  [r(0  +  2 )lfi)  -  r2((0  +  J)/0)J 

Process: 

The  failure  process  underlying  the  Weibull 
failure  model  has  been  conceptualized  as  a 
chain  of  links;  the  links  are  not  all  equally 
strong  but  are  chosen  from  a  population 
having  a  single  distribution  of  breaking 
strengths.  Stress  is  applied  to  the  chain  as  a 
whole  and  is  assumed  to  be  applied  equally 
to  each  link.  The  chain  breaks  (component 
fails)  when  its  weakest  link  fails.  Then  the 
probability  distribution  of  the  time  to  failure 
of  such  a  component  is  a  Weibull. 

Applicability  of  the  Weibull  Distribution  as  a 
failure  model: 

Normally  applied  to  strength  of  structures, 
electrical  connections  subjected  to  physico¬ 
chemical  degradation.  Simple  devices  which 
display  1FR  (a  >  1)  or  DFR  (ar  <  1)  charac¬ 
teristics. 


S-ll 
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5.2.6  Other  Distributions  Used  in 
Reliability  Modeling 

Many  more  distributions  have  been  used  in 
modeling  reliability  and  availability.  The  fol¬ 
lowing  list  is  far  from  exhaustive. 

a.  The  Bimbaum-Saunders  Fatigue  Life 
Distribution  [9]  Applicable  to  Structures. 

b.  The  Competing-Risk  Model  [9].  Appli¬ 
cable  to  more  complex  elements,  components 
where  there  is  more  than  one  mechanism  for 
failure. 

c.  The  Mixed  Distribution  Models  (9] .  Ap¬ 
plicable  to  physical  dimensions  of  mass  pro¬ 
duced  items. 

d.  The  General  Distribution  to  time  to  fail¬ 
ure  19) .  Applicable  to  truly  complex  el¬ 
ements.  components  or  equipments  which 
display  a  typical  “bathtub”  hazard  rate. 

e.  The  Piecewise  Linear  Models  [24],  Ap¬ 
plicable  to  complex  items  for  which  an  empir¬ 
ical  “bathtub”  hazard  rate  is  modeled  by 
means  of  linear  segments  of  the  form  h(t)  = 
ai  -  *>,  (t,  -  tj.,  )■ 

f.  The  Polynomial  Reliability  Model  (24). 
Applicable  to  complex  items  susceptible  to 
being  modeled  empirically  by  a  polynomial  of 
high  degree. 

g.  The  mixed  distribution  method  of 
Calvin  [25).  This  method  makes  use  of  a 
single  continuous  equation  to  describe  a 
“bathtub”  reliability  model.  Known  failure 
mechanisms  of  the  component  are  incor¬ 
porated  in  the  model  through  an  “additive” 
procedure.  Up  to  two  unknown  failure  proc¬ 
esses  can  be  considered  to  build  up  the  bath¬ 
tub  curve. 

h.  The  Bayesian  Beta  prior  and  conjugate 
Beta  posterior  of  the  binomial  failure  model 
[91. 

i.  The  Bayesian  Beta  prior  and  conjugate 
Beta  posterior  of  the  negative  binomial  failure 
model  [9] . 

j.  The  Bayesian  gamma  prior  and  conjugate 
gamma  posterior  of  the  exponential  failure 
model  [9] . 

k.  The  Bayesian  Inverted  gamma  prior  and 
conjugate  inverted  gamma  posterior  of  the 
Weibull  failure  model  [9] . 

l.  The  General  Failure  Rate  Function 
Model  [26j  which  can  be  made  to  Fit  empir¬ 
ically  many  types  of  data. 

m.  The  Mixed  Weibull-Gamma  Distribu¬ 
tion  Model  [26]  which  has  a  bathtub  shape 


and  can  be  made  to  fit  empirically  equipment 
exhibiting  infant  mortality  and  wearout. 

n.  The  uniform  and  truncated  uniform 
prior  on  the  Binomial  Failure  Model. 

o.  The  Bivariate  exponential  model  of 
reliability  [26]  which  is  applicable  to  com¬ 
ponents  subjected  to  three  different  types  of 
Poisson  disturbances. 

p.  The  three  major  types  of  Extreme-Value 
Distributions  15,  91  used  in  corrosion  prob¬ 
lems  and  in  stress-strength  interference 
models. 

In  addition,  many  other  statistical  distribu¬ 
tions  are  used  in  software  reliability  and  as 
Reliability  Growth  models. 

5.3  SELECTION  OF  A  FAILURE  MODEL 

This  paragraph  treats  the  problem  of  what 
to  do  when  failure  data  are  available  and  one 
is  confronted  with  a  preliminary  selection  of 
models  for  the  data  at  hand.  A  suggested 
procedure  is  presented  next  in  the  form  of  a 
roadmap. 

5.3. 1  Roadmap  for  the  Selection  of  a 
Failure  Model 

Figure  5-4  shows  the  various  factors  and 
options  involved  in  the  selection  of  a  failure 
model  when  test  data  are  available.  The  road¬ 
map  recognizes  that  components  placed  on 
test  may  fall  into  any  one  of  three  categories 
of  design  maturity,  namely:  (a)  a  fully  mature 
design,  (b)  a  partially  mature  design  and  (c)  a 
new,  immature  design.  The  roadmap  also 
recognizes  that  test  data  generated  and  used 
for  point,  interval  and  trend  estimates  of  com¬ 
ponent  reliability  may  be  any  one  of  the 
following:  (1)  times  to  failure,  (2)  cycles  to 
failure,  or  (3)  pass/fail  data;  it  may  include 
also  (or  only)  stress/strength  or  other  var¬ 
iables  (parameter)  performance  data.  The 
type  of  equipment  or  component  tested  will 
normally  dictate  the  type  of  data  generated 
and  collected  (also  see  section  9). 

5.3. 1.1  Test  of  Data  for  Homogeneity 

After  data,  e.g.  failure,  stress/strength 
parameters,  etc.  are  at  hand,  one  of  the  first 
questions  to  be  answered  is:  Are  the  test  data 
homogeneous?  That  is,  did  the  sample  data  all 
come  from  the  same  parent  population?  In 
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teal  life,  a  positive  answer  generally  requires 
that  all  hardware  items  used  in  the  test  sample 
have  been  built  from  the  same  drawing,  and 
under  manufacturing  operations  that  were  in 
a  state  of  statistical  control.  Items  built  to  a 
mature  design  and  under  fully  developed  man¬ 
ufacturing  procedures  and  controls  are  ex¬ 
pected  to  yield  homogeneous  failure  data  or 
stress/strength  variables  performance  data 
under  similar  test  conditions.  Conversely, 
items  built  to  a  changing  design  or  under 
changing  manufacturing  processes  are  expect¬ 
ed  to  yield  non-homogeneous  data  because  of 
inherent  differences  in  the  makeup  of  the  test 
items.  In  many  instances  the  analyst  may  have 
a  high  confidence,  based  on  a  review  of  the 
above  factors  and  their  relevance  to  a  specific 
application,  that  the  test  data  are  either 
homogeneous  or  non-homogeneous.  If  any 
doubt  exists,  the  analyst  should  find  one  or 
more  of  the  following  statistical  techniques 
helpful  in  conducting  homogeneity  analyses: 
trend  analysis  based  on  hypothesis  testing  (see 
an  example  below),  preparation  of  sample 
frequency  distributions  (see  an  example  in 
§  5.4.4),  or  quality  control  chart  analysis 
[50]. 


Homogeneity  Test  by  Trend  Analysis 

As  stated  in  [  1 1 ,  the  simplest  means  to 
detect  a  trend  is  visually,  by  plotting  cumula¬ 
tive  number  of  failures  versus  cumulative 
operating  time,  using  data  in  its  original 
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chronological  order.  If  there  is  reliability 
growth,  times  between  failures  become  larger, 
and  the  plot  is  concave  down.  If  reliability 
decays,  times  between  failure  become  smaller, 
and  the  plot  is  concave  up  (see  figure  5-5). 

F<?r  a  quantitative  trend-  test  for  time  to 
failure  data,  Laplace’s  test  can  be  employed 
as  indicated  in  [1].  Assume  that  x  failures 
have  been  experienced  during  a  test. 
Laplace’s  test  uses  the  u  statistic  of  Equation 
(5-1)  which  is  almost  distributed  as  N(0,  1) 
for  x  >  3. 

)  (12x)*  (5-1) 

T,  which  is  generally  unknown,  is  approx¬ 
imated  by  the  largest  time  to  failure  observed. 

The  null  hypothesis  of  the  test  (see  Appen¬ 
dix  C)  is  that  the  data  originates  from  a 
Homogeneous  Poisson  Process  (HPP).  If  u  is 
small,  reliability  growth  is  probably  happen¬ 
ing.  If  u  is  large,  reliability  degradation  is  the 
likely  situation. 

Example: 

Six  failures  occur  at  times  0.5  day,  I  day,  2 
days,  2  weeks,  6  weeks,  and  16  weeks  for  an 
element  which  is  replaced  on  teit  after  repair 
and  modification.  Assume  times  to  repair  and 
modify  are  negligible.  Testing  terminates  at 


O  RELIABILITY  GROWTH 
X  RELIABILITY  DECAY 

_l _ I  I 


CUMULATIVE  TEST  TIME  (WEEKS) 
Figure  5-5.  Visual  Reliability  Trend  Analysis 
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20  weeks.  Clearly  this  element  has  undergone 
reliability  growth.  But  how  sure  are  we  of 
this?  The  t,’s  (interval  between  failures),  in 
weeks,  are  0.07,  0.07,  0.14,  1.71, 4,  and  10; 
x  =  6;T  =  10;  u  =  -1.97. 

A  cumulative  normal  distribution  table 
[13]  indicates  that  the  critical  region  to  the 
left  of  -1.97  is  0.0244,  so  that  the  data  pre¬ 
sented  fails  the  HPP  (null)  hypothesis  at  the 
2.5%  level  of  significance  (since  2.5%  is  still 
>0.0244),  i.e.  data  is  non-homogeneous. 

Several  additional  methods  of  trend  detec¬ 
tion  are  presented  in  [2]  and  [3] . 

Returning  to  figure  5-4,  it  is  observed  that 
if  the  results  of  the  homogeneity  test  indicate 
the  failure  data  to  be  non-homogeneous,  the 
use  of  a  R  growth  model  is  warranted 
(§  5.4.4).  If  the  data  are  judged  to  be  homo¬ 
geneous,  the  data  should  next  be  analyzed  to 
determine  the  appropriate  failure  model. 

5.3. 1.2  Selection  of  a  Failure  Model  Based 
on  Hazard  Rate  Sketching  or 
Hazard  Rate  Plotting 

The  next  step  to  failure  model  selection  is 
sketching  or  plotting  the  data.  Hazard  rate 
sketching  which  works  for  time  to  failure  data 
reordered  by  magnitude,  consists  in  sketching 
an  approximation  to  the  hazard  rate  of  the 
failure  data  at  hand  and  comparing  visually 
the  sketch  obtained  with  the  theoretical 
hazard  rates  of  the  main  probability  models, 
as  illustrated  in  figures  5-6  through  5-8. 

Notice  that,  for  the  three  distributions  con¬ 
sidered,  the  following  results  are  always  true. 

Exponential:  The  hazard  rate  is  a  constant 
(X  or  1/0) 

Normal:  The  hazard  rate  increases 

Weibull:  The  hazard  rate  increases 

(0  >  1 ),  remains  constant 
(j8  =  I ),  or  decreases  (0  <  I ) 

Example  of  Hazard  Rate  Sketching  Based  on 
Times  to  Failure 

Times  of  failure  of  the  example  of 
|  5. 3. 1.2  were  originally  labeled: 

t,  =  12,  t2  «  6,  t,  =  19,  t4  *  17,  t,  -  2,  t6 
*  8,  t7  =  16,  t,  =  7,  t,  =  14,  t10  *  19,  t„  * 
1 1  and  tu  *  5. 


For  the  purpose  of  further  analysis  and  to 
indicate  that  time  reordering  by  magnitude  is 
taking  place,  we  relabel,  such  that: 


t(l)  “2,  *«>  “  S*  *(3)  “,^>*(4)  ~  7>*<s)  ~  8* 

l,<4>  7  iVd  2’  t(Vo=  4’ t(,)  =  l6,  t(,0)  = 


Figure  5-9  shows  a  hazard  sketch  (ordinary 
graph  paper)  of  these  ordered  times  to  failure 
data.  The  usual  approximation,  based  on  the 
mean  rank,  for  the  hazard  rate  at  the  ith 
ordered  failure  time  t(i)  is  given  by: 

**  [*<■)]  ~  * /j [V i )  "  *0)]  [x  -  <  +  l]  |  (5-2) 

For  small  samples,  however,  say  for  x  <  8, 
a  better  approximation  based  on  the  median 
rank  [5,  page  31  ]  or  [7]  is  given  by: 

h  [!<i>]=  1  / 1 [v . )  -  *0)]  [x  -  i  +  0.7]  j  (5-3) 

Using  equation  5-2  on  the  data  above  one 
encounters  a  problem.  The  last  two  t(j)’s, 
tjM)  and  t(12)  are  equal  and  the  approxima¬ 
tion  to  the  hazard  rate  goes  to  infinity  in  this 
case  at  t(ll).  Before  deciding  what  to  do 
about  it,  one  may  sketch  the  hazard  rate  using 
(5-2)  up  to  t(10)  *  17  (see  figure  5-9).  The 
hazard  rate  thus  drawn  shows  a  rapid  increase. 
One  cannot  include  1i(t(,  ())  in  the  sketch  but 
h(t(, ,,)  -»  <»  does  not  any  longer  seem  odd 
when  one  attempts  to  draw  a  smooth  dashed 
line  tr  represent  a  probable  hazard  rate 
through  the  points.  Perhaps  that  line  should 
be  boldly  inflected  upward  near  t  =  19  to 
indicate  that  1i(t(ll))  is  not  so  much  an 
“outlier”  as  an  indication  of  rapidly  in¬ 
creasing  trend.  However,  1i(t., ,,)  could  have 
been  an  outlier.  Hazard  rate  sketching  has  the 
drawback  of  being  “noisy”  in  the  sense  that 
it  magnifies  the  effect  of  a  bad  point.  Should 
such  an  “outlier”  be  suspected,  it  is  better  to 
include  it  in  a  preliminary  sketch  than  to  dis¬ 
miss  it  as  unrepresentative  before  viewing  its 
effect. 

Since  the  data  sketched  in  figure  5-9  were 
times  to  failure,  reference  to  figures  5-7  and 
5-8  indicate  that  both  the  Normal  and  the 
Weibull  are  candidate  distributions  for  times 
to  failure.  At  this  point  a  Goodness-of-Fit 
test  of  the  data  could  be  made  to  check  con¬ 
formance  of  the  data  to  each  of  the  two  can¬ 
didate  distributions.  If  neither  distribution 
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were  rejected  by  the  Goodness  of  Fit  test, 
the  analyst  should  select  that  distribution 
more  easily  applied. 

Example  of  Weibull  Hazard  Rate  Plotting 

Special  graph  paper  is  commercially  avail¬ 
able  for  Weibull  Hazard  Rate  Plotting  (20], 
[21].  On  this  graph  paper,  the  hazard  rate 
from  Weibull  failure  times  appears  as  a 
straight  line.  The  graph  paper  also  allows  the 
estimation  of  or  and  0  (see  §  5.2 .5)  with 
simple  geometric  constructions. 


The  data  on  this  page  have  been  gathered 
or  calculated: 

The  Weibull  hazard  plot  corresponding  to 
these  data  points  is  shown  in  figure  5-10.  The 
plot  is  of  ordered  times  to  failure  vs  cumu¬ 
lative  hazard  rate,  and  is  a  straight  line  for  the 
Weibull  on  log-log  graph  paper. 

The  advantage  of  hazard  rate  plotting,  as 
expounded  in  [17],  [18],  and  [19]  is  that  it 
accommodates  arbitrarily  censored  data. 
Hazard  rate  paper  also  exists  for  the  Log¬ 
normal,  and  is  simple  to  construct  for  the 
exponential  [  18] . 
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Figure  5-10.  Weibull  Hazard  Plot 

5.3.1 .3  Selection  of  a  Failure  Model  Baaed  on 
Trend  Plots  of  Failure  Probability 
(Cycles  to  Faflure  and  Pass/Fail  Data) 

The  objectives  of  this  step  will  generally 
have  been  accomplished  in  the  homogeneity 
tests  of  cycles  to  failure  and  pass/fail  data 
cited  in  the  early  steps  of  figure  5-4.  If  not 
done  previously,  moving  averages  of  the  test 
data  can  be  plotted  to  display  presence  or 
absence  of  significant  trends.  Absence  of  a 
significant  trend,  plus  random  scatter  of 
plotted  points  within  postulated  two-sided 
binomial  distribution  control  limits,  would 
support  a  conclusion  that  the  test  data  came 
from  a  population  with  constant  failure 
probability;  hence  the  use  of  binomial  failure 
model  would  be  warranted. 

5.3.1.4  Plot  of  Test  Data  in  Frequency 
Distribution  Form 

A  graphic  portrayal  of  the  test  data  in 
frequency  distribution  form  (Figure  5-12)  is 
very  helpful  in  displaying  symmetrical  or 
asymmetrical  properties  of  the  distribution 
of  stress/strength/parameter  performance 
data.  A  symmetrica],  bell-shaped  frequency 
distribution  (as  for  Action  Time  in  figure 
5-14,  §  5.4. 2.3.2)  would  be  considered  as 
evidence  that  the  parent  distribution  is  Nor¬ 
mal.  Also,  when  sample  data  from  a  Normal 
population  are  plotted  on  Normal  probability 


paper,  the  cumulative  percentage  points 
should  fall  on  or  close  to  a  straight  line,  as  in 
figure  5-IS,  §  5.4.2.3.2.  If  die  frequency  dis¬ 
tribution  is  skewed  (as  for  Peak  Pressure  in 
figure  5-14,  §  5.4.23.2),  one  should  plot  the 
data  on  another  candidate  probability  density 
paper,  e.g.,  lognormal  probability  paper.  If  a 
linear  trend  is  observed  (as  in  figure  5-16, 
§  5.4.23.2),  one  may  assume  the  sample  data 
came  from  a  Lognormal  population. 


S3. 1.5  Verification  (or  Refutation)  of  the 
Applicability  of  the  Candidates 
Failure  Model  by  a  Goodness-of-Fit 
Test* 

The  last  step  in  the  selection  of  a  failure 
model  (figure  5-4)  is  to  perform  a  goodness- 
of-fit  test  of  the  sample  data  to  each  can¬ 
didate  failure  model.  The  two  dominant 
goodness-of-fit  tests  described  in  statistical 
textbooks  [31,  [9]  are  the  Kolmogorov- 
Smimov  Test  and  the  Chi-Square  Test.  These 
tests  are  also  discussed  in  Appendix  C. 

A  Goodness-of-Fit  Test  for  the  Weibull 

A  Goodness-of-Fit  test  applicable  to  un¬ 
censored  or  censored  samples  of  the  two- 
parameter  Weibull  is  described  in  [46].  A 
more  accessible  presentation  of  the  same 
method  is  given  in  [  5 ) . 

This  method  consists  in  calculating  the 
statistic 


*Note  of  Caution:  Confidence  in  the  results  of  a 
goodness-of-fit  test  is  influenced  greatly  by  sample 
size.  Sample  sizes  of  the  order  of  100  or  more  are 
generally  required  to  be  reasonably  confident  of 
rejecting  a  bad  failure  model.  When  sample  sizes 
used  are  appreciably  less  than  100,  the  analyst 
may  find  that  no  candidate  failure  model  will  be 
rejected,  irrespective  of  the  number  of  models 
analyzed. 
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(i) 

*0) 

Ln  tfl) 

M, 

»WMi 

£u(i)/M, 

1 

0.1 

-2.303 

0 

1.020551 

0 

0 

0.1 

-2.303 

1.609 

0.521285 

3.087 

3.087 

0.5 

-0.693 

0.588 

0.35541 5 

1.6S4 

4.741 

0.9 

-0.105 

0.288 

0.272945 

1.054 

5.795 

1.2 

0.182 

0.981 

0.223885 

4.381 

10.176 

3.2 

1.163 

0.363 

0.191578 

1.894 

12.070 

4.6 

1.526 

0.314 

0.168899 

1.862 

13.933 

6.3 

1.841 

0.188 

0.152286 

1.232 

15.164 

7.6 

2.028 

0.560 

0.139783 

4.003 

19.168 

10 

13.3 

2.588 

1.298 

0.130219 

9.967 

29.135 

11 

48.7 

3.866 

0.000 

0.122871 

ExEK 

29.135 

12 

48.7 

3.866 

0.022 

0.117274 

0.190 

29.325 

13 

49.8 

3.908 

0.504 

0.113132 

4.451 

33.777 

14 

82.4 

4.412 

0.022 

0.110268 

0.196 

33.973 

15 

84.2 

4.433 

0.619 

0.108598 

5.702 

39.675 

16 

156.4 

5.052 

0.030 

0.108124 

0.274 

39.948 

17 

161.1 

5.082 

0.032 

0.108944 

0.297 

40.246 

18 

166.4 

5.114 

0.576 

0.111289 

5.175 

45.421 

19 

296.0 

5.690 

0.089 

0.115596 

0.769 

46.189 

20 

323.5 

5.779 

0.066 

0.122683 

0.539 

46.728 

21 

345.6 

5.845 

0.104 

0.134165 

0.772 

47.500 

22 

383.3 

5.949 

0.040 

0.153650 

0.263 

47.763 

23 

399.1 

5.989 

0.049 

0.191137 

0.256 

48.019 

24 

25 

419.1 

628.0 

6.038 

6.443 

0.404 

0.289773 

1.396 

49.414 

where  [x/2]  denotes  the  greatest  integer 
<  x/2,  i.e.  for  x=25,  [x/2]  =  12.  The  M,  and 
the  critical  values  of  S  have  been  tabulated 
[51  for  i=3  (1)25. 

The  test  cannot  be  directly  applied  to  the 
data  of  §  5.3. 1.3  because  it  exceeds  25  data 
points,  but  since  the  method  is  applicable  to 
censored  data,  it  is  at  least  possible  to  take 
the  first  25  ordered  times  and  perform  a 
Goodness-of-Fit  test  on  these  points. 


From  the  entries  in  the  last  column  of  the 
data  on  this  page: 


S 


49.414-29.325 

49.414 


=  0.407. 


The  critical  value  of  S  at  a  *  5%  level  of  sig¬ 
nificance  is  0.65  in  the  referenced  tables 
[5-Appendix  13) .  Even  fora  *  25%  level  of 
significance,  the  critical  value  of  S  is  still  0.56. 


The  hypothesis  that  the  1st  25  ordered  times 
of  the  given  sample  are  Weibull  is  there¬ 
fore  accepted.  If  one  wished  to  include  the 
5  points  which  have  not  been  considered,  one 
would  also  perform  the  test  on  the  last  25 
points. 

Goodness-of-Fit  Tests  for  the  Normal 

Liliiefors  modification  of  the  Kolmogorov- 
Smimov  Test  [16,5]  applies  to  distributions 
which  are  assumed  to  be,  under  the  null 
hypothesis,  normal  with  unknown  (i  and  o. 
Since  this  test  applies  to  lognormal  repair 
times  as  well,  if  the  logarithms  of  the  repair 
times  are  considered  instead  of  the  repair 
times  themselves,  an  example  of  application 
of  this  test  to  data  transformed  to  normal  is 
shown  in  appendix  C  §  C.l.  Another 
goodness-of-fit  test  for  the  normal  is  the 
Shapiro  and  Wilks  W-  Test  illustrated  in 
figure  5-17. 
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Goodness-of-Fit  Test  for  the  Exponential 

Lilliefors  modification  of  the  Kolmogorov* 
Smirnov  test  for  the  exponential  case  is  useful 
both  in  reliability  and  availability  assessment 
and  is  illustrated  in  appendix  C  §  C.l . 

5.4  RELIABILITY  ASSESSMENT 
METHODS  AND  ILLUSTRATED 
APPLICATIONS 

Once  the  model  (distribution)  has  been  de¬ 
termined,  the  data  must  be  analyzed  to  pro¬ 
vide  reliability  assessment.  This  subsection 
contains  basic  descriptions  of  the  principal 
and  current  methods  used  for  assessing  hard¬ 
ware  (and  software)  reliability.  They  include 
the  following  in  their  order  of  presentation: 

§  5.4.1  Gassical  Methods 
§  5.4.2  Variables  Reliability  Methods 
Based  on  the  Normal  Distribution 
§  5.4.3  R  Assessment  Method  Based  on 
Stress-Strength  Interference  Model 
§  5.4.4  Methods  for  Assessing  R  Growth 
§  5.4.5  Rubinstein  Method 
§5.4.6  Bayesian  Methods 
§  5.4.7  Methods  for  adjusting  R  Estima- 
mation,  Derating  and  Uprating. 

Illustrated  applications  of  these  methods  are 
provided. 

5.4.1  Classical  Methods  for  Estimating 
Hardware  (and  Software)  Reliability 

Classical  methods  used  for  estimating  hard¬ 
ware  (and  software)  reliability  are  loosely 
defined  as  those  reliability  point,  interval  and 
trend  estimation  techniques  based  upon  fun¬ 
damental  statistical  distributions  including  the 
Binomial,  Exponential,  Normal  and  Weibull. 
These  methods  are  documented  in  summary 
form  in  figure  5-1 1 .  Their  applications  are 
illustrated  on  a  sample  basis  below. 

5.4.1 .1  Reliability  Point  and  Interval 
Estimation  Using  the  Exponential 
Model 

Point  and  interval  estimation  formulas  and 
results  depend  on  the  testing  procedure 


(§  5.1.1)  from  which  failure  data  are  ob¬ 
tained  and  the  mathematical  technique  used. 
This  point  will  be  made  clear  in  the  following 
subparagraphs.  As  stated  in  §  5.1.3  the  most 
desirable  R  and  A  estimator  are  the  unbiased 
and  minimum  variance  ones. 

5.4.1. 1.1  Maximum  Likelihood  (ML) 

Point  Estimations  of  X,  6,  R,  and 
Calculations  of  Xu  (upper 
confidence  limit  of  X). 

Another  method  for  obtaining  reliability 
estimates  is  the  Maximum  Likelihood  (ML) 
Method.  ML  estimates  for  type  II  censoring 
without  and  with  replacement  and  type  I 
censoring  without  and  with  replacement 
situations  are  presented. 

5.4.1 .1 .1 .1  Type  II  Censoring  without 
Replacement 

It  is  shown  by  making  use  of  order-statis¬ 
tics  in  [9]  that  the  ML  estimator  of  6  for 
failure  truncated  tests  is: 

6  =^i\  +  (n-x)Tx  ^  lx  (5-5) 

where  n  is  the  number  of  items  on  test,  7, 
represents  the  time  to  failure  of  each  item  on 
test  since  the  beginning  of  testing,  and7x  the 
time  to  failure  of  the  last  item  to  fail. 

Example  Type  II.  Censoring  Without  Re¬ 
placement: 

Assume  that  1 5  exponential  items  are  put 
on  test  and  the  decision  is  made  to  stop  test¬ 
ing  after  the  8th  failure.  The  recorded  times 
to  failure  in  mission  equivalents  are: 

t,  -  2.504,  tj  =  4.877,  f,  =  7.657,  f4  -  1 1.170, 

f5  =  14.675,  t'6  *  25.423,  f,  =  28.075,  and 
t,  =  57.588. 

Then,  from  equation  5-5: 

?=  [  1 5 1 .969  +  ( 1 5-8)  (57.588))  /8  -  69.385  missions, 

—  1 

X  =  —  =0.0144123  failure/mission 
and  0 

R  (For  1  mission)  =  =  0.9857 
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Figure  5-1 1 .  Classical  Methodology  Summary  for  Assessing  Component  R,  MTTF,  and  Failure  Rate  (Continued) 
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It  must  be  noted  that,  in  the  ML  method 
it  is  permissible  to  replace  a  parameter  by  its 
estimate  in  a  function  to  obtain  a  proper 
estimate  of  the  function.  Other  methods  do 
not  generally  have  this  property. 

The  ML  method  does  not  guarantee  un¬ 
biasedness  of  the  estimate,  however,  and 
while  0  is  unbiased  for  this^type  of  test  with 
exponential  components,  X  is  not.  An  un¬ 
biased  estimate  of  X  is  given  by  : 

Xu„b,«d  =(ir)  *  (5-6) 

For  the  data  above:  XUnhi.=  0.0126108  fail¬ 
ures/mission. 

The  upper  confidence  limit,  Xu,  for  X  is 
given  in  [9]  (also  see  Appendix  D  §  D.l)  by: 

xu  =^XJ  ,^;2x/2x  (5-7) 


actual  X  expresses  the  occurrence  of  one  in 
five  chances  of  failing  to  do  so  from  a  random 
sample  (80%  confidence). 

S  .4.1 .1.1.2  Type  II  Censoring  with 
Replacement 

It  is  shown  in  [91 ,  that: 

t  =  x/n7x  (5-8) 

xu  =X1,.0;Jx/(2N7x)  (5-9) 


Example  Type  II.  Censoring  with  Replace¬ 
ment: 

Assume  N  =  15  and  the  same  data  as  in 
§  5.4.1. 1.1.1,  then: 


For  a  confidence  y  =  0.8  =  l-o,  from  Ap¬ 
pendix  E,  figure  E-l . 

X^.8;I6  =  20.465 

therefore 

Xu  =0.0184341  failures/mission 

If  only  N  =  8  items  had  been  placed  on  test 
and  the  same  times  to  failure  had  been  ob¬ 
tained,  then  the  use  of  the  same  formulae 
would  have  given : 


X  =  8/(15)  (57.588) 

=  0.009261 19  failures/mission 

and,  at  the  80%  confidence  level: 

Xu  =  (20.465)/(30)  (57.588) 

=  0.01 18456  failures/mission 

5.4.1 .1 .1 .3  Type  I  Censoring  without 
Replacement 

Again  from  [9] ,  we  have: 


~  (151.969+0) 

g  =  - - - =  18.996  missions 

O 

^  1 

X  =—  =  0.0526427  failures/mission 

o 


X  =  0.046062  failures/missions 
_  (0.0526427)  (20.465) 

"  16 

=  0.0673333  failures/mission 


The  uncensored  test  is  of  special  interest 
since  the  original  data  were  obtained  from  a 
Monte-Carlo  simulation  with  X  selected  to  be 
0.07.  The  fact  that  Xu  did  not  bracket  the 


X  =  x/^s7i  +  (  N-x)t)  (5-10) 

where  T  is  the  preassigned  test  duration. 

X  is  unbiased  for  this  type  of  test  with  ex¬ 
ponential  components,  but  0  is  biased. 

The  problem  of  obtaining  a  theoretically 
satisfying  lower  confidence  limit  on  0  is 
shown  to  be  extremely  complex  in  [9,  p.  173- 
174J,  where  only  a  conservative  solution  is 
provided. 

5.4.1. 1.1 .4  Type  I  Censoring  with 
Replacement 

From  [9] , 

=  x/NT  (5-11) 
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Again  ^  is  unbiased  but£  =  l/X  is  biased. 
An  unbiased  estimator  of  6  is 

^UnbiMbd  =  ®  (5*12) 

A  conservative  formulation  for  Xu  from 
[91,  also  derived  from  a  different  viewpoint 
in  Appendix  D  §  D.l  is: 

K=x\.a_2xt2m  (5-13) 

Example  Type  /.  Censoring  with  Replacement 

A  test  with  a  single  replaceable  item  on  test 
is  to  run  for  100  missions.  The  ordered  times 
to  failure,  in  missions,  are: 

t,  =  0.045,  t2  =  0.538,  t3  =  4.000,  t4  =  8.303, 
ts  =  12.518,  t6  »  23.962,  and  t7  =  32.350. 

Then: 

X  =  7/(1)  (100) 

=  0.0700000  failures/mission 

(T=“:  =  14.2857  missions 
X 


If  N  is  large  with  respect  to  the  population 
M  of  components,  i.e.,  N/M  >  0.1 ,  the  hyper¬ 
geometric  distribution  should  be  used,  rather 
than  the  binomial  [  1 1 ,  p.  40-43] . 

If  testing  is  effected  on  a  variable  number 
of  units  N  until  x  failures  occur,  then  the 
appropriate  PMF  for  N  considered  as  a  ran¬ 
dom  variable  is  not  the  binomial  but  the 
negative  binomial.  This  method  of  sampling 
for  defectives  (which  is  sometimes  preferred 
because  it  is  less  costly  than  Bernoulli  sam¬ 
pling  when  expensive  units  are  destroyed  in 
testing,  for  instance)  is  called  Pascal  sampling 
[12]. 

Reliability  assessments  which  make  use  of 
the  binomial  PMF  and  CDF  are  illustrated 
in  this  subparagraph  by  means  of  the  follow¬ 
ing  examples. 

Example,  Pass /Fail  or  Bernoulli  Data: 

Twenty-five  one-shot  items  are  test  fired. 
Test  results  show  one  item  out  of  25  failed 
performance  requirements. 

Denoting  the  number  of  items  on  test  by 
N,  the  number  of  successful  items  by  s  and 
the  number  of  failed  items  by  x,  then: 

R  *  s/N  (5-14) 


e 


Unbiased 


12.50000  missions 


or  R  =  (N-x)/N  (5-15) 


and  the  lower  reliability  limit  satisfies  the 
And  at  the  80%  confidence  level,  relation: 


\  =  X2.8;16/(2)(100) 

=  .102325  failures/mission 

The  data  in  this  example  were  obtained 
with  a  Monte-Carlo  simulation  of  the  ex¬ 
ponential  where  the  “true”  value  of  X  was  set 
to  be  0.05  failures/mission. 

5.4.1. 2  Reliability  Assessment  Point  and 
Interval  Estimation  Using  the 
Binomial  Model 

The  binomial  is  applicable  in  life-testing 
situations  when  times  to  failure  are  unknown 
or  irrelevant.  Also,  the  binomial  is  applicable 
in  Bernoulli  from  large  lots,  or  sampling  with 
replacement,  so  that  reliability  may  be  as¬ 
sumed  to  remain  constant  in  successive  trials. 


L=  £  ,  (^)  RLN‘(1-RL)‘  (5-16) 

For  the  data  presented: 


and,  solving  iteratively  for  RL  ,  L  =  .80. 
is  /25\ 

0.8  =  Z  (— )rl**  ‘(l-R,  )\ 

Rl  =  0.8849 

This  value  for  RL  can  also  be  read  directly 
from  figure  E-3  in  Appendix  E. 
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Note:  The  data  for  this  case  was  actually 
Monte-Carlo  simulated  with  a  true  R  =  0.91. 

Example.  Cyclic  Data: 

A  replaceable  item  is  put  on  test  for  100 
cycles.  Each  cycle  is  a  success  if  the  item  is 
operating  after  the  cycle.  The  test  results 
produce  1  failure  in  the  100  cycle  test. 

Equations  5-1 5  and  5-16  still  apply  to  this 
situation,  but  N  becomes  nc,  the  number  of 
test  cycles,  s  is  now  the  number  of  successful 
cycles,  and  x,  the  number  of  failed  cycles. 
Then: 

R  =  99/100  =  0.99,  and 

Rl  =  0.9703  (from  figure  E-3) 

where  L  =  0.80 

Assume  that  a  mission  requires  1 0  cycles.  The 
probability  of  completing  a  mission  is 
(pcycie)'0  where  is  the  probability  of 

completing  a  cycle.  Therefore, 

Pcycle  “  P  ant^  PcycleL  ~  PL 

then  RMi«ion  =  (0 .99)' 0  =0.9044 

and  R,  *  (0.9703)'°  -  0.7397 

Minion 

[Caution:  The  component  lower  bounds  on 
reliability  cannot  be  used  (combined)  in  a 
system  model  to  obtain  a  system  lower 
bound] 


5.4.1. 3  Reliability  Assessment  for 
Normal  Times  to  Failure 


f,  =  2.6003,  fj  =  3.1467,  f3  =  3.0685, 
f4  =3.2501,  and  fs  =  1.8684. 

Assume  that,  even  though  it  is  difficult  to 
discriminate  with  only  5  failures  between  a 
normal  distribution  of  failure  times  and  other 
distributions,  the  data  is  indeed  normally 
distributed. 

Then  a  point  estimate  of  MTTF  is: 


also,  as  usual  for  the  Normal: 


1 


i=i 


(5-17) 


(5-18) 


A  lower  limit  at  L  confidence  for  MTTF  is 
obtained  by : 

MTTFL  =  K«TT^-tL  x_,  o/^T  (5-19) 

where  tL  is  the  L-percentile  of  the 
Student-t  distribution  with  x-1  degrees  of 
freedom. 


For  the  given  data: 


13.934/5  =  2.7868  missions, 
0.57048  and 


MTTFl 


2.7868 


(’.533)  (0.57048) 
272361 


2.3957  missions 


where  L  =  0.80. 


A  simple  example  will  be  given  here  to 
illustrate  one  possible  application  of  the 
Normal  model  (see  §  5.2.3). 

Example,  When  Times  to  Failure  are  Nor¬ 
mally  Distributed: 

A  single  replaceable  item  is  put  on  test 
until  five  failures  are  observed.  The  times  to 
failure  in  mission  are : 


A  simplistic  approach  to  find  an  80%  lower 
bound  on  reliability  is  to  assume 

o-'o  and  MTTF  =  MTTFL 

This  leads  to 

Rl  -  J7  Normal  ip  -  2.3457,  o  =  0.57048)  *  0.993 
where  L  =  0.80. 
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Note:  The  data  were  obtained  from  a  Monte' 
Carlo  simulation  of  normally  distributed 
times  to  failure  with  actual  MTTF  =  2.5  and 
actual  a  -  0.5 

5.4.2  Variables  Reliability  Methods 

Based  on  the  Normal  Distribution 

An  important  use  of  the  normal  distribu¬ 
tion  in  the  assessment  of  product  reliability 
is  as  a  basic  PDF  for  the  “variables”  method. 
The  variables  method  applies  when  reliability 
is  best  defined  in  terms  of  performance  var¬ 
iables,  such  as  a  re-entry  angle  of  attack  for 
a  missile,  or  critical  parameters,  such  as  a 
current  or  an  applied  difference  of  potential 
being  within  critical  limits,  rather  than  in 
terms  of  success  of  failure  attributes.  In  most 
cases,  the  variables  method  assumes  that  the 
performance  parameter  is  normally  distrib¬ 
uted  or  can  be  normalized  by  transformation, 
with  a  true  unknown  mean,  p,  and  standard 
deviation,  a.  At  the  end  of  testing,  the  initial 
parameter  sample  mean,  X,  and  sample  devia¬ 
tion,  s,  are  available  to  estimate  reliability  at 
the  desired  confidence  level.  With  these 
sample  statistics,  reliability  calculations  are 
performed  using  tolerance  factors  available  in 
many  statistical  texts  [e.g.,  ( 1 1 ,  p.  3 1 1-3 1 8)1 . 

5. 4 .2.1  Two-Sided  Case 

The  two-sided  case  deals  with  the  propor¬ 
tion  (reliability)  of  a  parameter  X  that  lies 
between  X  -  Ks  =  LCLS  and  X  +  Ks  =  UCLS, 
at  confidence  7,  where  LCLS  is  the  tower 
critical  limit  specification,  UCLSJs  the  upper 
critical  limit  specification,  and  K  and  s  are 
measured  from  test  data.  It  is  assumed  that 
LCLS  and  UCLS  are  symmetric  about  X. 
Mathematically, 

P(P(X-Ks <  X  <  X+Ks)  >  R]  =  7  (5-20) 

where  R  is  reliability. 

Tables  of  two-sided  tolerance  factors  113), 
tabulated  against  R  and  7  allow  combinations 
of  R  and  7  to  be  found  which  satisfy  (5-20). 
Appendix  E,  figures  E-4  and  E-5,  provide 
tables  of  two-sided  tolerance  factors  at  the 
50  and  80  percent  confidence  levels.  To  illus¬ 
trate  the  method,  tables  are  not  employed 


here  but  an  approximate  equation  [11]  is 
used.  This  equation  is: 

Kr  =41^)7^]  (5-21) 

In  equation  5-2 1 ,  KR  is  the  normal  deviate 
for  the  reliability  sought,  K_is  the  tolerance 
factor  obtained  as  K  =  (X  -  LCLS)/s  or 
(UCLS  -  X)/s,  N  is  the  test  sample  size  and 
X*  is  the  chi-square  CDF  with  a  =  1  -  7. 

Example,  Two-Sided  Case 

Assume  that  n  =  5,  X  is  measured  to  be  8, 
s  =  0.484,  LCLS  =  6,  UCLS  =  10,  then 
K  =  4.09.  Assume  also  that  the  desired  con¬ 
fidence  7  is  0.80,  then  x*  =  =  x£  20 

=  1 .648  (for  4  degrees  of  freedom).  Equation 
5-21  gives  KR  =  2.39  which  is  the  normal 
deviate  corresponding  to  RL  =  0.992  at  80% 
confidence.  If  another  confidence  level  were 
selected,  the  RL  would  not  be  0.992.  KR  , 
hence  the  reliability  could  also  be  specified, 
and  the  confidence  of  reliability  computed  if 
desired . 

If  LCLS  and  UCLS  are  not  symmetric 
about  X,  the  calculation  of  reliability  and 
confidence  is  much  more  involved  [51]. 

5. 4.2.2  One-Sided  Case 

Often,  a  parameter  will  lead  to  a  failure 
only  if  it  exceeds  or  is  below  a  critical  value. 
In  this  case,  a  one-sided  equation  is  formed, 
either  P[P(X  <  X  +  Ks)  <  R)  =  7  or 
P[P(X  <  X  -  Ks)  <  RJ  =  7-  Tables  of  one¬ 
sided  tolerance  factors  [13]  tabulated  against 
R  and  7  permit  combinations  of  R  and  7  to 
be  found  which  satisfy  these  equations. 
Appendix  E  presents  a  table  of  one-sided 
tolerance  factors  in  figure  E-5. 

Alternately,  an  approximation  formula 
(14)  can  be  used,  as  follows: 

Kr  =  K  -  K^ ~j  +  kz  /2f  (5-22) 

where  K  =  (SL-X)/s,  SL  is  the  one-sided 
specification  limit,  K7  is  the  normal  deviate 
at  confidence  7,  Kk  is  the  normal  deviate  for 
the  reliability  sought. 

For  example,  if  K  equals  2.31  for  N  =  10 
items  on  test,  and  7  is  selected  as  0.95,  then 
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K  =  1.64S,  and  KR  =  1.274.  From  a  table 
of  normal  deviates:  R  =  0.899  (at  95  con¬ 
fidence). 

5. 4.2. 3  Demskey’s  Extension  of  the  Variables 
Method 

The  variables  technique  (VAEPJ  proposed 
by  Demskey  [14]  is  applicable  when  reliabil¬ 
ity  is  defined  entirely  in  terms  of  the  prob¬ 
ability  that  one  or  more  performance  var¬ 
iables  will  jointly  take  values  within  specified 
limits.  In  addition,  the  distribution  of  each 
variable  must  be  continuous  and  definable  by 
its  low-order  moments.  Test  data  consist  of 
repeated  measurements  of  the  variables  taken 
under  conditions  pertinent  to  the  mission. 
The  data  for  each  variable  are  tested  for 
goodness-of-fit  to  a  normal  distribution.  If 
the  fit  is  unacceptable,  various  transforma¬ 
tions  of  the  data  are  tried  until  an  acceptable 
normal  fit  is  achieved.  Then  tolerance  limit 
theory  is  applied  to  obtain  point  and  con¬ 
fidence  limit  estimates  of  reliability  for  each 
variable.  The  calculated  limits  within  which 
a  predicted  proportion  of  a  normal  popula¬ 
tion  will  fall  are  defined  by  X  ±  KRCnfs* 
where  the  subscripts  denote  reliability,  con¬ 
fidence,  sample  size  and  degrees  of  freedom 
(usually  n-1),  respectively*,  and  where  s  is 
the  sample  standard  deviation. 

Quantitative  data  are  obtained  for  each 
variable  for  each  unit  of  a  test  sample.  The 
data  for  each  variable  are  examined  to  esti¬ 
mate  the  distribution  of  the  parent  popula¬ 
tion.  Examination  may  include  the  prepara¬ 
tion  of  a  frequency  distribution  of  the  sample 
data,  a  plot  of  the  data  on  probability  paper, 
a  test  for  goodness-of-fit  to  an  assumed  dis¬ 
tribution,  or  a  test  for  normality  such  as  the 
W-test  (21  ,(3) .  If  the  parent  population  is 
non-normal,  a  function  is  determined  and  ap¬ 
plied  to  transform  the  sampling  distribution 
into  a  normal  distribution.  The  mean  p  and 
the  standard  deviation  o  of  the  normal  pop¬ 
ulation  are  then  estimated  from  the  normal¬ 
ized  sample  data.  Tolerance  limit  theory  for 
normal  distributions  is  applied  to  obtain  a 
point  estimate  and  lower  confidence  limit 


•Tables  of  KRCnf  are  provided  in  Figures  E-5  through 
E-7. 


of  reliability  for  each  variable,  where  reliabil¬ 
ity  of  the  variable  is  defined  as  the  propor¬ 
tion  of  the  parent  population  within  the  spec¬ 
ified  tolerance  interval.  Finally,  the  reliability 
estimates  for  all  variables  are  combined  by 
means  of  a  mathematical  model  to  obtain 
point  and  lower  confidence  limit  estimates  of 
device  reliability. 

In  many  instances  normally  distributed 
data  are  not  to  be  expected  and  transforma¬ 
tion  is  necessary  or  desirable.  Efforts  to 
establish  or  attain  reasonable  normality  of 
variables  data  are  justified  by  the  mathemat¬ 
ical  efficiency  and  ease  of  application  of  that 
distribution,  by  the  availability  of  published 
tolerance  limit  factors  for  that  distribution, 
and  because  many  statistical  tests  of  hypoth¬ 
eses  presume  normality  of  the  parent  popula¬ 
tions. 

Nevertheless,  instances  arise  when  error 
associated  with  the  use  of  normal  theory  are 
unacceptable  and  it  is  necessary  to  work  di¬ 
rectly  with  untransformed  data.  Moreover, 
tolerance  intervals  can  be  computed  directly 
for  many  distributions  of  interest,  including 
the  exponential,  Weibull,  extreme  value, 
Cauchy  and  logistic  [71 .  Thus,  if  there  is  an  a 
priori  reason  to  believe  that  data  are  distrib¬ 
uted  according  to  one  of  these  forms,  trans¬ 
formation  may  not  be  necessary.  Non- 
parametric  (distribution  free)  tolerance  in¬ 
tervals  may  also  be  computed  using  tables 
provided  by  Somerville  [8] .  However,  reliabil 
ity  estimates  based  on  non-parametric  tol¬ 
erance  intervals  will  be  less  precise  than  those 
developed  from  data  derived  from  known  dis¬ 
tributions  and  should  be  used  only  if  all  else 
fails. 

5. 4. 2.3.1  Model  for  Independent  Variables 

Let  the  reliability  of  an  equipment  be  de¬ 
termined  by  the  numerical  values  taken  by 
each  of  m  continuously  distributed  inde¬ 
pendent  variables  X, ,  X2,  . .  .Xm  with  lower 
and  upper  specification  limits  L.j  and  U,  re¬ 
spectively,  where  i=l.  2,  . . .,  m.  The  equip¬ 
ment  fails  if  any  of  the  m  variables  fall  out¬ 
side  its  specification  limits.  The  general 
reliability  series  model  then  is: 

R-  (P(L,  <X,<U,)1  •  [P(L2  <X2  <U  )1 

(5-23) 

. I'd  »<xm<um)] 
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or 

R  =  fl  R,  (5-24) 

iF  1 

Omission  of  one  or  more  relevant  param¬ 
eters  from  the  model  will  usually  result  in 
optimistically  high  reliability  estimates;  con¬ 
versely,  inclusion  of  non-relevant  parameters 
in  the  model  will  often  result  in  underesti¬ 
mating  the  reliability. 

If  an  equipment  is  in  a  success  state  when 
any  one  of  its  m  variables  takes  a  value  within 
specified  limits,  the  reliability  model  is  that  of 
a  parallel  system. 


R=  1  -  11  ( 1  — Rj)  (5-25) 

1  =  I 

5 .4. 2.3. 2  Model  for  Dependent  Variables 

When  reliability  is  high  and  t  =  1  mission, 
Xj  »  1  -  R,.  For  this  case  equipment  reliabil¬ 
ity,  when  the  variables  are  correlated  (depen¬ 
dent),  is  estimated  by  the  model 

/S  m  ^  |  m  ^ 

t5'2W 

where: 


S.4.2.3.3  Computation  of  Tolerance  Factors 

The  limits  within  which  a  predicted  propor¬ 
tion  of  ajiormal  population  will  fall  are  de¬ 
fined  by  *  1  ^RCnfs>  where  X  is  the  sample 
mean  and  where  the  subscripts  denote  respec¬ 
tively,  reliability,  confidence,  sample  size  and 
degrees  of  freedom  (usually  n-1),  and  s  is  the 
sample  standard  deviation: 


s 


(Xj-X)’ 

n-1 


(5-28) 


These  limits  converge  to  the  familiar  X  ±  Ks 
for  large  samples,  where  the  unsubscripted  K 
is  the  standard  normal  deviate.  A  tolerance 
factor  may  be  estimated  for  a  given  specifica¬ 
tion  limit  (SL)  by: 


SL-X 

K*Cnf=—  (5-29) 


One-sided  factors  can  be  computed  using  the 
relationships: 


m  =  number  of  performance  variables  in 
the  model. 

Ph  =  significant  simple  coefficient  of  cor¬ 
relation  between  ith  and  jth  variable. 

Pi>jk  =  significant  coefficient  of  multiple 
,  correlation  for  the  ith  variable,  hold¬ 

ing  constant  the  effects  of  the  jth, 
kth,  etc.  variables.  It  may  be  inter¬ 
preted  as  the  simple  coefficient  of 
correlation  between  the  actual  valuer 
of  the  ith  variable  and  those  predict¬ 
ed  by  regressing  that  variable  against 
all  the  others. 

In  the  case  where  the  model  variables  are 
mutually  independent  the  correlation  coeffi- 
cents  are  all  identically  zero  and  the  model 
reduces  to  (see  equation  5-24): 


where  K„  and  Kc  are,  respectively,  the  stan¬ 
dard  normal  deviate  for  reliability  R  and  con¬ 
fidence  C,  n  is  sample  size  and  f  is  degrees  of 
freedom.  Two-sided  factors  are  obtained 
using: 


where  x?  is  the  chi-square  statistic  for  con¬ 
fidence  C  with  f  degrees  of  freedom.  In  the 
absence  of  tolerance  limit  tables*! 20],  these 
equations  can  be  used  to  estimate  reliability 
or  confidence  by  solving  for  KR  or  and 
referring  these  computed  statistics  to  pub¬ 
lished  tables  of  the  standard  normal  distribu¬ 
tion  or  chi-square  distribution. 


R  = 


m  /s 


Bl  O-S 


n  R^i-n, 

i*  i  i*  i 


(5-27) 


'Limited  one-sided  and  two-sided  tolerance  factors 
are  available  in  Appendix  E. 
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Having  computed  Krcj  from  test  data  by 
means  of  equation  5-29,  point  and  interval 
reliability  estimates  for  the  ith  model  variable 
are  computed  by  reference  to  tabular  values 
of  Kucnf.  is  the  reliability  tabulated  for 
KRCof  at  confidence  C  =  .50.  Rt.  is  the  reli¬ 
ability  tabulated  at  confidence  C  =  .80. 

Then  a  series  subsystem  point  estimate  R  is 
obtained  from  equation  5-26  or  5-27  depend¬ 
ing  on  whether  the  dependent  or  independent 
model  is  used.  A  confidence  interval  estimate 
for  a  dependent  serial  system  is: 


Rl  =  R-R 


d,2 

R;2 


u  AtoAir 

2  Li? I  Rit  ,Ri  J) 


(5-32) 


For  an  independent  serial  system: 

T  % 


Rl  =  R-R 


d:2 


i=l  Ri 

J 


(5-33) 


In  both  of  the  above: 

dj  =  Rj  -  RL .  (5-34) 

For  a  parallel  system  of  identical  independent 
elements,  the  point  estimate  is: 


R  ^  I  -  II  t,  (5-35) 

i  -  I 


and  the  confidence  limit  estimate  is 

RL  =  R-nt^'dj  (5-36) 


S.4.2.3.4  Outliers 

Data  classified  as  outliers  by  the  Dixon  test 
(9)  or  similar  tests,  and  not  explainable  as 
test  errors,  can  be  treated  as  observations 
from  subpopulations.  When  there  are  two  or 
more  outlier  data  points,  the  mean  and  var¬ 
iance  of  the  subpopulation  are  estimated  in 
the  usual  manner.  A  single  datum  is  taken  as 
an  estimate  of  the  subpopulation  mean;  the 
subpopulation’s  variance  is  assumed  to  be 
identical  to  that  of  the  main  population. 
Statistics  resulting  from  the  main  sample  and 
outlier  sample  are  frequency-weighted. 

For  a  single  outlier,  the  failure  rate  model 
is: 

^  1  ^  n— 1  c 

X=-X0+ -  Xp  (5-37) 

n  n  v 

where  X0  and  Xp  are  the  point  failure  rate 
estimates  for  the  outlier  and  main  group 

respectively.  The  weighted  estimate  of  failure 

rate  is  then  used  in  the  point  and  interval 
reliability  equations  exactly  as  an  unweighted 
estimate  would  be  used. 

Example,  variables  model 

Fifty  solid  rocket  motors  are  test  fired; 
thrust-time  profile  and  chamber  pressure 
(psia)  measurements  are  tabulated  below  as 
peak  pressure  and  action  time  (Figures  5-12 
and  5-13).  The  logarithms  of  peak  pressure 
values  are  added  to  the  tables  for  subsequent 
use  in  developing  reliability  estimates. 

Frequency  distributions  using  ordered  test 
data  are  shown  in  figure  5-14,  parts  1  and  2. 
Action  time  is  plotted  on  normal  probability 
paper,  figure  5-15.  Peak  pressure  is  plotted  on 
log  normal  probability  paper,  figure  5-16. 
Several  statistical  quantities  required  for 
analysis  are  computed. 


When  zero  failures  are  observed,  an  adjust¬ 
ment  can  be  made  to  obtain  a  finite  estimate 
of  variance  and  hence  of  the  confidence  in¬ 
terval.  The  adjustment  is  to  redefine  the  com¬ 
ponent  point  estimate  as  ^  =  n/4  in  the  con¬ 
fidence  interval  equation  5-36. 

Series-parallel  configurations  can  be  treated 
by  stepwise  reduction  to  an  equivalent  series 
model. 


nT  =  50 
IT  =  6005.7 
ST2  =  723,024.39 

(6005.7)2 

(IT)2/nT  =  - — *  721,368.6498 
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Action 

PvA 

Log  IV  jk 
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Time 

Pressure 

Pressure 

Unit 

T 

P 

Z 
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Figure  S-12.  Sample  Test  Data  (Hypothetical) 


S^.  (Sum  of  Squares)  =  ST2 
-l(ZT)J/nrl  =  1,655.7402 


B  _  / 1,655.7402 

StV  49" 


=  5.81 


-  6005.7 
T  =  — =,20n 


np  =50 


IP  =  79603 


Figure  $-13.  Sample  Data  and  Transformed 
Data  Presented  in  Order  of 
Magnitude,  Low  to  High 


IP1  =  133,657,959 
(79603)2 

(IP)2/nP  =-  s™  '  =  126,732.752 


S2  (Sum  of  Squares)  =  IP2  l(IP)2  /nP  1 
=  6,925,207. 


Sr 


6,925,207 

49 


=  375.9 


P= 


79603 

50 


=  1592.1 
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ACTION  TIME  (SECONDS) 


Figure  5-14.  Frequency  Distributions  for  Sample  Test  Data 


CUMULATIVE  PERCENTAGE 

NOTE  STRAIGHT  LINE  REPRESENTS  THE  PARENT  DISTRItUTlON 
N<1 20.6)  FROM  WHICH  SAMPLE  WAS  DRAWN 

Figure  $-15.  Normal  Probability  Plot  for  Action  Time 


Flam  $-14.  Log  Normal  Probability  Plot  of  Peak  Pressure 
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n2  *  50 

2Z=  159.57021 
IZJ  *509.689050550 
(159.57021)* 


Interpolating  linearly ,  using  a  Table  of  KRCnf : 


(ZZ)1/nz  =  - 


50 


509.253038389 


S’  «  IZ1  -(2Z )1/nz  =  .43601216 


43601216 


49 


=  .09433 


_  159.57021 

T=  log  P  * - — - =  3.19140 

ZZT=  19,176.764081 

where  P  is  peak  pressure,  T  is  action  time,  and 
Z  is  log  peak  pressure. 

The  degree  of  correlation  between  variable 
Z  and  T  is  expressed  by  the  simple  correlation 
coefficient  p2T  where, 


Pzt  : 


[IZT-(rZ)(2T)/n] 


V[ZZ*  -(IZ)2/n]  •  [IT2  (£T)J/n] 


(5-38) 


Inserting  appropriate  quantities  from  Figure 
5-18,  we  have  p„  =  .4014. 

All  data  are  tested  for  normality  by  the 
Shapiro  and  Wilks  W-test,  Figure  5-17.  It  is 
concluded  that  the  variables  action  time,  T 
and  log  of  peak  pressure,  Z  are  normally 
distributed. 

Since  the  estimated  correlation  of  action 
time,  T  and  log  peak  pressure  Z  is  statistically 
significant  at  the  .01  significance  level,  the 
correlated  model  is  applied  to  estimate  rocket 
performance  reliability  based  on  T  and  Z 
data. 

Using  equation  5-29,  compute  two  one¬ 
sided  tolerance  intervals  for  action  time  T. 


tK*Cnf)u 


(K*Cnf)l. 


140-120.11 


Lj  -T  100-120.11 
S,  5.81 


3.423 


-3.461 


KRCnf 

R 

3.0902 

.999 

3.423 

.9995  | 

3.7190 

.9999 

3.0902 

.999 

3.461 

|  .9996 

3.7190 

.9999 

Excluding  .0005  on  one  side  of  the  distribu¬ 
tion  and  .0004  on  the  other  side,  a  reliability 
of  Rt  =  .9991  is  estimated  for  the  action  time 
variable.  This  process  is  repeated  for  the  80 
percent  confidence  limit. 

KftCnf 

R 

3.4031 

.999 

3.423 

9991 

4.0862 

.9999 

3.4031 

.999 

3.461 

.9992 

4.0862 

.9999 

Excluding  .0009  on  one  side  of  the  distribu¬ 
tion  and  .0008  on  the  other,  gives  RT  = 
.9983  at  80  percent  confidence  for  the  achon 

time  variable. 

As  an  alternative,  equation  5-30  could  be 
solved  for  K„  with  respect  to  each  specifica- 

tion  limit  separately. 

Rearranging  equation 

5-30  and  referring  to  a  table  of  the  standard 

normal  distribution: 

Kr  “  KRCnf  -  Kc  J 

/Khc  1  <5*39> 

'  -TT - 

2f  n 

=  3. 423 -.842 

% 

3.423  1 

98  50 

=  3.320 

The  corresponding  R  for  the  upper  limit  is 
.9995,  about  as  interpolated  previously.  For 
the  lower  limit  of  action  time: 


KRCnf  "  Kc 


/KRCnf 

1 

/  2f 

n 
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1  Pwt  I :  Action  Time,  T  (Seconds)  j 

Descending 

Ascending 

k 

i 

Values 

i 

Values 

An-k+l" 

of  T» 

of  T* 

1 

50 

136.3 

i 

105.8 

.3751 

2 

49 

131.8 

2 

108.8 

.2574 

3 

48 

127.8 

3 

111.9 

.2260 

4 

47 

127.3 

4 

112.8 

.2032 

5 

<4 

126,7 

5 

113.1 

.1847 

6 

45 

126.6 

6 

113.7 

.1691 

7 

44 

126.4 

7 

113.8 

.1554 

8 

43 

126.4 

8 

113.9 

.1430 

9 

42 

126.0 

9 

113.9 

.1317 

10 

41 

125.6 

10 

114.2 

.1212 

11 

40 

124.6 

11 

114.5 

.1113 

12 

39 

123  4 

12 

114.8 

.1020 

13 

38 

123.3 

13 

115.8 

.0932 

14 

37 

123.1 

14 

116.5 

.0846 

15 

36 

123.1 

15 

117.2 

.0764 

16 

35 

122  9 

16 

117.4 

.0685 

17 

34 

122.9 

17 

118.2 

.0608 

18 

33 

122.5 

18 

118.2 

.0532 

19 

32 

122.5 

19 

118.3 

.0459 

20 

31 

122.2 

20 

118.4 

.0386 

2l 

io 

122.0 

21 

118.6 

.0314 

22 

29 

1214 

22 

119.2 

.0244 

23 

28 

121.0 

23 

119.6 

.0174 

24 

27 

120.7 

24 

119.8 

.0104 

25 

26 

120.4 

25 

120.4 

.0035 

Part  2:  Log  Peak  Pressure  (tog  psia),  Z  j 

Descending 

Ascending 

k 

i 

Values  of 

i 

Values  of 

Vk+I 

Z-3 

Z-3 

1 

50 

.43807 

i 

.03342 

.37S1 

2 

49 

.42586 

2 

.05269 

.2574 

3 

48 

.38166 

3 

.05538 

.2260 

4 

47 

.34674 

4 

.06856 

.2032 

5 

46 

.32552 

5 

.08422 

.1847 

6 

45 

.31366 

6 

.09096 

.1691 

7 

44 

.31175 

7 

.09307 

.1554 

8 

43 

.28937 

8 

.09587 

.1430 

9 

42 

.28058 

9 

.09899 

.1317 

IQ 

41 

. 26998 

IQ 

.10924 

.1212 

11 

40 

.26340 

11 

.11661 

.1113 

12 

39 

.24920 

12 

.12872 

.1020 

13 

38 

.24576 

13 

.13066 

.0932 

14 

37 

.23805 

14 

.13098 

.0846 

If 

36 

.22660 

If 

.13481 

.0764 

16 

35 

.22427 

16 

.13672 

.0685 

17 

34 

.22324 

17 

.14176 

.0608 

18 

33 

.22063 

18 

.14208 

.0532 

19 

32 

.21032 

19 

.14644 

.0459 

20 

31 

.19866 

l2SL 

.14860 

.0386 

21 

30 

.19811 

21 

.14860 

.0314 

22 

29 

.19145 

22 

.15927 

.0244 

23 

28 

.18780 

23 

.16316 

.0174 

24 

27 

.18526 

24 

.16376 

.0104 

2S 

26 

.18327 

25 

.16643 

.0035 

i  =1.2 . n 

k  =  n/2  =  =  25 


bl 


W 


T 


=  .3751  (136.3-  105  8) 

+  .2574(131.8-  108.8)  +  --- 
+  .0035(120.4-  1 20.4)  =  40.44899 

=  ST2  -  (ZT)2/n  =  723,  024.39  - 
721,  368.6498=  1655.7402 

_  hT2  40.448992 
'  -  2  ”  1655.7402  =  988 


P(W=.955)  =  10 


P(W=.974)  =  .50 


and  P(WT=  .988)  >  50. 


bz  =  .64385118 
S2  =  .43601200 


W, 


.9508 


P(WZ  =  .9508)  a  .075 


*  from  Figure  5-13 

••from  Hahn,  C.  and  Shapiro,  C.S.  -  Statistical 
Methods  in  Engineering- Wiley  and  Sons,  1967 


Figure  5-17.  W-Tests  for  Normality 
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3.461  -  .842 


ll 


3.461 

98 


SO 


=  3.346 


R  for  the  lower  Jimit  is  .9996  as  previously 
found.  Therefore,  I^T  =  .9991,  which  agrees 
with  the  results  obtained  using  tables. 
Proceeding  to  the  log  pressure  variable  Z, 


Uz  =  log  3000  psia  -  3.47712 


(K|lCnf)u 


Uz-Z 

sz 

3.47712-3.19140 

.09433 


3.028 


-  .9973  -  .9973  (.0024) 


=  .9949  at  80%  confidence. 

5.4.3  Reliability  Assessment  Method  Based 
on  Stress-Strength  Interference  Models 

Reliability  models  for  structures  are  usually 
complex  models  of  the  Weibull  or  Lognormal 
type,  or  stress-strength  models. 

5.4.3. 1  Stress-Strength  Interference,  Normal 
Distribution 


Interpolation  gives: 

Rz  =998 

1*2  l  =  .996  at  80  percent  confidence 

To  estimate  motor  reliabilityjequation  5^6 
is  solved.  The  approximations  XT  s  1  -  RT 
=  .0009  and^z  a  1  -  Rz  =  .002  are  used. 


m  =  2  i  R,  \  Pjj  Pij\  Xpj j/m1 

1  .9991  .0009  .4014  .00036  .2007 

2  .998  .002  .4014  .0008 

.0029  .00116 


R  = 


Itl 

1  -  Z  Xs 

i=  I 


=  1-.0029  +  (,2007)(.001 16) 


=  .9973 


The  confidence  limit  is  found  using  equation 
5-32. 


m  =  2  i 


d,  d,/R, 


1  .9991  .9983  .0008  .0008 

2  .998  .996  .002  .002 


In  ( 1 2  ] ,  a  model  of  normally  distributed 
stress  and  strength  is  presented  with  an  ex¬ 
ample  (see  Figure  5-18).  In  the  figure,  s  is  a 
normally  distributed  stress  with  mean  p,  and 
standard  deviation  os,  and  v  is  a  normally 
distributed  strength  with  mean  p¥  and  stan¬ 
dard  deviation  ov .  The  interference  area  is  a 
high-stress  low  strength  area  where  failure  is 
likely  to  occur.  In  the  example,  burst  pres¬ 
sures  of  rocket  chambers  may  be  known  to 
be  normally  distributed  with  mean  p,  =  800 
psia  and  standard  deviation  ov  =  100  psia. 
Twenty  (20)  test  firings  of  a  solid-propellant 
rocket  engine  are  made  and  the  maximum 
value  of  chamber  pressure  is  evaluated  for 
each  firing.  The  sample  mean  measured  maxi¬ 
mum  pressure  is  found  to  be  T=  400  psia  and 


Figure  5-18.  Stress-Strength  Interference  Probability 
Density  Functions 
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the  sample  standard  deviation  "o,  =  25  psia.  It 
is  shown  in  the  example  that  the  reliability 
estimate  for  the  engine  is: 

.5 

R  =  <*>  — '■ .  =  0.9996  (540) 

y°i  +  o} 

where  4  is  the  normal  CDF.  An  approximate 
lower  confidence  bound  RL ,  is  given  by: 

rl  =  R-K0.9ov  (541) 

where  K  is  the  normal  deviate  j]or  90%  con¬ 
fidence,  and  V,  the  variance  of  R,  is  approx¬ 
imated  through  a  complicated  expression  (see 
[12]),  valid  when  R  is  high.  The  result  is 
Rl  =  0.9993  where  L  =  0.90.  Additional 
point  and  interval  reliability  estimation  ex¬ 
amples  for  structures  (normal  PDF  case)  are 
given  in  [23] . 

5.4. 3. 2  Stress-Strength  Interference,  Wcibull 
and  Other  Distributions 

Reference  [5]  presents  examples  of  relia¬ 
bility  point  and  interval  estimation  for 
Weibull,  lognormal,  and  exponential  stresses 
and  strengths,  and  for  such  combinations  as 
exponential  stress  with  normal  strengths,  nor¬ 
mal  stress  with  exponential  strengths,  and 
normal  stress  with  Weibull  strengths.  Point 
and  interval  estimates  of  reliability  for  these 
cases  are  given  only  for  situations  where  the 
parameters  of  the  distributions  are  known, 
rather  than  estimated  from  test  data. 

5.4.3.3  Time  Dependent  Stress-Strength 
Models 

Models  have  been  developed  to  account 
for  repeated  application  of  stresses,  as  well  as 
change  in  the  distribution  of  strength  caused 
by  aging  or  cumulative  damage.  Details  on 
these  models  are  given  in  (5) . 

5.4.4  Methods  for  Assessing  R  Growth 

When  hardware  (and  software)  elements 
of  a  system  are  modified  throughout  the  test 
program,  as  failure  modes  are  uncovered  by 
testing,  product  reliability  should  show  steady 
growth  which  can  be  measured  and  forecast 
by  a  mathematical  model. 

In  [44],  *ven  criteria  are  proposed  for 
reliability  growth  models.  They  are: 


a.  The  model  should  provide  for  reliability 
estimation  and  future  prediction. 

b.  The  model  should  have  minimum  bias 
and  minimum  variance. 

c.  A  method  for  interval  estimates  should 
be  inherent  in  the  model. 

d.  The  model  should  be  relatively  insensi¬ 
tive  to  external  factors  such  as  data  grouping, 
the  range  of  reliability  or  rapidity  of  growth. 

e.  Estimation  techniques  should  be  com¬ 
patible  with  digital  computing  methods. 

f.  The  model  should  reflect  what  is  really 
happening. 

To  these  desirable  features  of  reliability 
growth  model  critiera,  can  be  added  the  quan- 
tiative  criteria  of  goodness-of-flt  of  actual  fail¬ 
ure  data. 

In  [44],  thirty-nine  models  of  reliability 
growth  proposed  in  the  literature  were  as¬ 
sessed  against  the  qualitative  requirements 
described  in  the  previous  paragraph.  The 
principal  finding  of  the  report  was  that  the 
Duane  reliability  growth  model  was  equal  or 
preferable  to  all  others  considered  with  re¬ 
spect  to  the  given  critieria.  Reference  [45]  in¬ 
dicates  that  while  Duane  is  the  preferred 
model  in  many  situations  and  is  the  most 
generally  applicable  model,  other  models  may 
be  better  in  specific  applications.  The  Duane 
model  is  discussed  here  and  is  defined  as: 

A(t)  =  K(l-a)t“  (542) 

where  X(t)  is  the  failure  rate  at  time  t,  while 
K  and  a  are  parameters  to  be  estimated.  Other 
forms  of  the  Dunae  model  are: 

1.  Aj.  =  Kt'°,  where  Xr  =  cumulative 

failure  rate,  K  and  o  are  parameters  to  be  esti¬ 
mated  from  data,  and  t  =  total  oprating 
hours,  cycles  or  missions. 

2.  qE  =  KN°  (applicable  to  attribute 
data),  where  qL  =  cumulative  probability  of 
failure,  N  *  number  of  trials. 

The  Duane  growth  model  plots  as  a  straight 
line  on  log-log  paper  (see  §  6.5).  In  order  to 
test  its  validity  for  a  particular  set  of  failure 
data,  it  is  recommended  that  a  Cramer-Von 
Mises  goodness-of-fit  test  be  performed  first. 
The  description  of  the  goodness-of-fit  test  is 
delayed  until  point  and  interval  estimates  are 
presented  in  the  next  paragraph,  because  the 
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test  requires  a  knowledge  of  these  estimates 
before  it  can  be  applied. 

5.4.4. 1  Point  and  Interval  Estimates  in  the 
Duane  Reliability  Growth  Model 

The  point  and  interval  estimates  presented 
here  are  those  of  the  Army  Materiel  Systems 
Analysis  Agency  (AMSAA)  model  which  is  a 
close  relative  of  the  Duane  model  and  is  de¬ 
scribed  in  [47] .  It  is  shown  in  [47]  that  point 
estimates  of  1-a  and  K  may  be  obtained  from 
the  simultaneous  solution  of  the  following 
pairs  of  equations: 

For  failure  truncated  tests: 
a=  1-  (x-2)/  f(x-l)  In  tx  -  2*  In  t,  1  (5-43) 


K  =  x/t,0^  (5-44) 


0(t)7  =  L(J,.0?(t)  (5-49) 

If,  for  instance,  10  failures  have  occurred  in 
failure  terminated  tests,  then: 

<Kt)0  ,o  »  4  a*  Ht)  =  0.6852  ?(t) 

Example,  Duane  Growth  Model  for  Time 
Truncated  Tests. 

Figure  5-21  shows  synthetic  failure  data 
with  corresponding  calculations  of  3(t)  and 
0L(t).  It  was  assumed  that  testing  was  test 
time  truncated  with  T  after  each  failure  half¬ 
way  before  the  next  failure.  The  estimation 
was  obtained  with  equations  5-45  and  5-48. 
Notice  that  while  the  estimated  a  and  show 
increasing  stability  with  x,  indicating  the 
possibility  of  an  adequate  fit  to  the  Duane 
model  (see  §  5.4.4. 2  for  a  confirmation), 
and  0L  show  the  increased  values  with  time 
expected  from  the  achievement  of  reliability 
growth. 


For  time-truncated  tests: 


<?=  Mx-l)/ 


(xlnT  -  2 

i*  1 


(5-45) 


(5-46) 


S.4.4.2  Goodness-of-Fit  to  Duane  Reliability 
Growth  Model 

A  Cramer-Von  Mises  Goodness-of-Fit  test 
for  the  Duane  R  growth  model  is  presented 
in  [47],  The  test  statistic  is,  for  failure-trun¬ 
cated  tests: 


In  the  expressions  above,  x  is  the  total  num¬ 
ber  of  failures,  t,,  the  time  to  failure  for  the 
ith  failure,  and  T  the  test  duration. 

The  expressions  above  are  valid  even  for 
small  samples.  They  yield  a  point  estimate  of 
failure  rate  in  the  Duane  model  given  by : 

'X(t)  =  K(lS)f~  (5-47) 

According  to  [47] ,  an  estimate  of  the  MTBF, 
0lt),  is 


C1  = 
'-*•1 


X  *1 


— - —  +  y 
1 2(x- 1 )  +  ,t, 


2i-l 

2(x-l) 


i 


(5-50) 


and  for  time-truncated  tests: 


C*  1  =  12(x-l)  +  J, 


?(t)  =  1/X(t)  (5-48) 


(5-51) 


A  lower  bound  on  MTBF,  0L,  can  be  calcu¬ 
lated  [48]  from  the  two-sided  tables  of  [47] , 
which  are  reproduced  in  figure  5-19,  and 
5-20,  as  follows: 

In  order  to  find  a  y  lower  bound  on  0,  <L, 
for  instance,  one  should  take  the  appropriate 
L  factor  from  the  2y-\  column  to  obtain: 


The  critical  values  of  C?  ,  are  shown  in  figure 
5-22. 

Example,  Goodness  of  Fit  to  Duane  Model 
For  Time  Truncated  Test  Data 

Equation  5-51  was  used  on  the  data  dis¬ 
played  in  figure  5-21  to  generate  the  test 
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■ 

L 

.80 

U 

L 

.90 

U 

L 

.95 

U 

L 

.98 

U 

2 

.8065 

33.76 

.5552 

72.67 

.4099 

151.5 

.2944 

389.9 

3 

.6840 

8.927 

.5137 

14.24 

.4054 

21.96 

.3119 

37.60 

4 

.6601 

5.328 

.5174 

7.651 

.4225 

10.6S 

3368 

IS. 96 

5 

.6568 

4.000 

.5290 

5.424 

.4415 

7.147 

3603 

9.995 

6 

.6600 

3.321 

.5421 

4.339 

.4595 

5.521 

3815 

7388 

7 

.6656 

2.910 

.5548 

3.702 

.4760 

4595 

.4003 

5.963 

8 

.6720 

2.634 

.5668 

3.284 

.4910 

4.002 

.4173 

5.074 

9 

.6787 

2.436 

.5780 

2.989 

.5046 

3.589 

.4327 

4.469 

10 

.6852 

2.287 

.5883 

2.770 

.5171 

3.286 

.4467 

4.032 

11 

.6915 

2.170 

.5979 

2.600 

.5285 

3.054 

.4595 

3.702 

12 

.6975 

2.076 

.6067 

2.464 

.5391 

2.870 

.4712 

3.443 

13 

.7033 

1.998 

.6150 

2.353 

.5488 

2.721 

.4821 

3.235 

14 

.7087 

1.933 

.6227 

2260 

.5579 

2597 

.4923 

3.064 

15 

.7139 

1 .877 

.6299 

2.182 

.5664 

2.493 

.5017 

2.921 

16 

.7188 

1.829 

.6367 

2.144 

.5743 

2.404 

.5106 

2.800 

17 

.7234 

1.788 

.6431 

2.056 

.5818 

2.327 

5189 

2.695 

18 

.7278 

1.751 

.6491 

2.004 

.5888 

2.259 

.5267 

2.604 

19 

.7320 

1.718 

.6547 

1.959 

.5954 

2.200 

.5341 

2.524 

20 

.7360 

1.688 

.6601 

1.918 

.6016 

2.147 

.5411 

2.453 

21 

.7398 

1.662 

.6652 

1.881 

.6076 

2.099 

5478 

2.390 

22 

.7434 

1.638 

.6701 

1.848 

.6132 

2.056 

.5541 

2.333 

23 

.7469 

1.616 

.6747 

1.818 

.6186 

2.017 

.5601 

2.281 

24 

.7502 

1.596 

.6791 

1.790 

.6237 

1.982 

5659 

2.235 

25 

.7534 

1.578 

.6833 

1.765 

.6286 

1.949 

.5714 

2.192 

26 

.7565 

1.561 

.6873 

1.742 

.6333 

1.919 

5766 

2.153 

27 

.7594 

1.545 

.6912 

1.720 

.6378 

1.892 

.58)7 

2.116 

28 

.7622 

1.530 

.6949 

1.700 

.6421 

1 .866 

.5865 

2.083 

29 

.7649 

1.516 

.6985 

1.682 

.6462 

1.842 

.5912 

2.052 

30 

.7676 

1.504 

.7019 

1.664 

.6502 

1.820 

.5957 

2.023 

35 

.7794 

1.450 

.7173 

1.592 

.6681 

1.729 

.6158 

1.905 

40 

.7894 

1.410 
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Figure  5-19.  Confidence  Intervals  for  MTBF  in  the  Duane  Growth  Model  from  Failure  Terminated  Test 
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Figure  5-20.  Confidence  Intervals  for  MTBF  in  the  Duane  Growth  Model  from  Time  Terminated  Test 
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testing  at  various  assembly  levels.  The  equa¬ 
tions  given  in  the  following  subparagraphs 
are  derived  in  [10]. 

5.4.5. 1  Bias  Arising  from  Mixed  Truncation 
of  Tests 

Tests  may  terminate  either  by  failure  or 
accumulation  of  planned  test  times.  Gener¬ 
ally,  both  truncation  policies  are  employed, 
so  that  test  data  for  components  in  a  system 
are  neither  completely  Poisson,  nor  complete¬ 
ly  binomial.  Under  these  conditions,  the  com¬ 
ponent  failure  rate  estimate,  "X  =  ZX/Z 1, 
where  ZX  is  total  observed  failures  and  It 
is  total  actual  test  duration  in  equivalent 
missions,  is  biased.  The  bias  is  most  significant 
early  in  a  test  program  when  data  are  few. 
The  estimate  given  below,  equation  5-52,  con¬ 
tains  a  provision  for  correcting  this  bias. 

5.4. 5. 2  Unbiased  Estimation  of  Failure  Rates 
and  Their  Variances 

An  estimate,  which  contains  a  bias  correc¬ 
tion  factor,  is: 


Figure  5-21.  Reliability  Growth  Failure  Data  and 
Estimated  Parameters 

statistic  Cj.,  shown  in  figure  5-23.  As  can  be 
seen,  from  the  fact  that  the  calculated  values 
of  C*.,  in  figure  5-23  are  all  smaller  than 
the  0  01  critical  values  of  Cj  .,  in  figure  5-22, 
the  goodness-of-fit  hypothesis  is  not  rejected 
at  the  0.99  significance  level  for  large  values 
of  x. 

5.4.5  The  Rubinstein  Method 

This  method  termed  the  Basic  Method  in 
revision  A  of  this  manual,  was  developed  by 
David  Rubinstein  in  1958.  It  treats  the  mixed- 
censoring  one-component-at-a-time  test  case, 
and  adjusts  for  multiple  environments  and  for 


Equation  5-52  is  valid  when  the  failure  rate  is 
small.  The  summation 

N 

Z  T= 

i=i 

is  the  sum  of  the  planned  test  times  for  the  N 
tests.  As  an  example,  assume  three  units  of  a 
component  are  tested.  The  stress  applied  dur¬ 
ing  the  test  will  occur  for  6  minutes  of  the  de¬ 
sign  mission.  Thus,  I  hour  of  test  time  is 
equivalent  to  10  missions  (a  =  10).  Two  of 
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Figure  5-24.  Hypothetical  Set  of  Test  Data 


A  simpler  approximation  to  the  bias  cor¬ 
rection  factor,  which  gives  a  slightly  conser¬ 
vative  result  (exact  when  all  T(  are  equal), 
does  not  require  storage  of  the  planned  test 
time.  This  approximation  of  equation  5-52  is: 


(i),  and  combinations  of  environments  (j)  and 
test  states  (k)  (see  §  4.1.2).  If  the  j-environ- 
ments  and  the  k-test  states  used  to  test  a 
given  component  are  representative  of  oper¬ 
ational  environments  and  demands,  the  oper¬ 
ational  failure  rate,  X,  of  a  given  component 
can  be  estimated  by  use  of  equation  5-54, 
where  i  is  a  constant  and  identifies  the  com¬ 
ponent  of  interest. 

Xs  =  I  Z  Xjjk  (5-54) 

j  k 

Equation  5-55  can  be  used  in  a  similar  manner 
to  calculate  the  variance  of  the  estimated  X 
for  component  i. 


J 


I  I. 
j 


Xia 


al  =  I  £•'  ■ 

A‘  k  *  i  k  I 


Ulk_ 

iik 


(5-55) 


N 


i  =  I 


where  N  is  the  number  of  tests. 
Since 


(5-53) 


0.857. 


Then: 

X  =  (0.01 )  (0.857)  =  0.00857  failures/mission. 

The  variance  of  the  unbiased  estimate  is  esti¬ 
mated  by: 


cl 


N 


I  t 

i- 1 


.00857 

100 


=  .0000857 


S.4.5.3  Combining  Estimates  from  Different 
Environments  and  Test  States 

Failure  rates  and  variance  of  estimate  are 
computed  individually  for  each  component 


Finally,  component  reliability  estimates  are 
computed  using  the  summed  failure  rate 
estimates: 

%  =  e  (5-56) 

5.4.S.4  Confidence  Limits  on  Failure  Rates 
and  Reliability 

All  statistical  estimates  based  on  sampling 
are  subject  to  uncertainty,  therefore,  it  is 
necessary  to  calculate  confidence  limits.  Such 
calculations  are  not  readily  amenable  to  shifts 
from  one  assembly  level  to  another,  so  con¬ 
fidence  limits  for  components  are  difficult  to 
translate  to  higher  levels.  The  equation  given 
here  for  confidence  limits  addresses  this  prob¬ 
lem.  It  is  based  on  normal  distirubtion  theory 
and  has  been  corrected  to  compensate  for  the 
fact  that  failure  rate,  X,  is  usually  a  small 
value. 

In  general,  only  upper  confidence  limits  are 
of  interest  for  failure  rates.  The  upper  limit 
for  a  component  is  computed  as: 


+  W)2?i  +  xAT^K^C,  ♦  (0K)4C,2 

2 

(5-57) 
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where 


3 


and  K  is  the  standard  normal  deviate  for 
specified  confidence  level  (e.g.,  0.842  for  80% 
confidence,  1.282  for  90%  confidence). 

While  normal  distirubtion  theory  is  not 
completely  appropriate  for  small  values  of  X 
the  procedures  given  above  compensate  for 
that  difficulty.  However,  as  the  confidence 
level  is  reduced  toward  50%,  an  additional 
modification  becomes  appropriate.  This  bias 
correction,  called  /?,  is  particularly  important 
when  few  failures  have  been  observed.  Thus, 
the  product  (SK  is  used  rather  than  K  in  the 
equations.  For  100a%  confidence  and  X  fail¬ 
ures,  ft  is  computed  from  the  relationship: 
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Figure  5-25.  Bela  Correction  Factors  for  80  Percent 
Confidence 

Xu  is  the  larger  of  equations  5-57  and  5-60 
using  the  smallest  tyk  with  zero  failures. 


where 


Then, 


&  = 


X„  -X 

Kn/*T 


“  xW* 


0  = 


.(xa/??--x 

KvOTT 


(5-59) 


For  example,  given  zero  failures  and  a  speci¬ 
fied  confidence  level  of  80  percent,  it  is 
quickly  found,  using  tables  (e.g.,  Appendix  E 
figure  E-l): 


*u  *  0.5x1gO:i  =(0.5)  (3.219)=  1.6095 


and 


1.6095-0 

0  =  0.842^TSm 


1.507 


S.4.5.5  Unequal  Numbers  of  Tests  in 
Multiple  Environments 

The  Rubinstein  method  can  be  used  to 
assess  reliability  when  data  is  from  different 
test  states  and  multiple  environments,  as 
shown  in  the  following  example: 

Two  stress  environments  are  defined:  high 
temperature  (h)  and  vibration  (v).  In  each  of 
these  environments  a  component  is  tested 
non-operating  (a)  and  operating  (d).  Figure 
5-26  summarizes  the  mission  exposure  times 
for  the  component  where  a  provides  the 
equivalent  missions. 


Environment 

Test  States 

Mission 
Exposure 
Time  (Min) 

Mission/ 

Min. 

a 

High 

Non-operating  (a) 

10.00 

0.10 

Temperature  (h) 

Operating  (d) 

0.50 

2.00 

Vibration  (v) 

Non-operating  (a) 

0.25 

4.00 

Operating  (d) 

20.00 

0.05 

In  figure  5-25,  values  of  0  for  80%  confidence 
are  listed. 

When  no  failures  have  occurred,  the  upper 
limit  of  failure  rate  is  defined  as: 


x  s  mi— 

ui  smallest 


(5-60) 


Figure  5-26.  Mission  Exposure  times  for  a 

Component  by  Environment  and  Test 
States 

For  the  component  in  high  temperature 
and  non-operating  (ha)  four  tests  are  per¬ 
formed  with  the  following  durations  (in 
equivalent  missions)  and  the  following  results. 
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Test  1:  Component  failed  after  142  min¬ 
utes  of  a  scheduled  300  minute  test  (14.2 
equivalent  missions). 

Test  2:  Component  did  not  fail  in  300 
minutes  of  testing  (30  equivalent  missions). 

Test  3:  Component  failed  after  147  min¬ 
utes  of  a  scheduled  300  minute  test  (14.7 
equivalent  missions). 

Test  4:  Component  did  not  fail  in  300  min¬ 
utes  of  testing  (30  equivalent  missions). 

These  tests  are  summarized  on  the  first  line 
of  figure  5-27.  Three  other  sets  of  tests,  in 
test  states  hd,  va,  and  vd,  were  performed  on 
the  component.  The  test  data  are  summarized 
in  figure  5-27.  A^’s  are  calculated  using 
equation  5-53. 
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Figure  5-27.  Test  Data  Summary. 

Estimates  of  the  combined  failure  rate  and  re¬ 
liability  are: 


Point  and  interval  reliability  estimates  for 
binomial  test  data  (defective  vs  non-defec¬ 
tive  classification  of  test  results)  can  also  be 
derived  using  the  Rubinstein  Method. 

5.4.6  Bayesian  Methods  and  Their  Statistical 
Formulation 

In  principle,  all  of  the  component  failure 
distribution  models  discussed  herein  can  be 
used  for  Bayesian  reliability  assessment.  Tradi¬ 
tional  (frequentist)  statisticians  consider  the 
parameters  of  these  distributions  as  fixed, 
though  unknown,  and  estimable  if  sufficient 
test  data  are  available.  In  Bayesian  estimation 
a  degree  of  belief  viewpoint  about  the  possi¬ 
ble  values  the  distribution  parameters  can 
assume  replaces  the  frequentist’s  assumption 
of  fixed  parameters.  This  degree  of  belief 
can  be  based  on  engineering  judgment  or  on 
previous  data  in  accordance  with  the  rules 
shown  in  §  5. 4.6.9.  It  is  embodied  in  the 
values  given  “prior”  parameters  of  a  “prior” 
distribution,  values  which  reflect  the  “weak¬ 
ness”  or  “strength”  of  the  degree  of  belief. 

As  test  data  become  available,  the  prior 
distribution  is  modified  by  the  data  to  form  a 
posterior  distribution  by  the  use  of  Bayes’ 
Theorem.  Bayes’  Theorem  can  be  written 
generally  as: 

PT(0|u)  =  Pr(u|0)  •  Pr(0)/Pr(u)  (5-61) 


=  .02  +  .01  +  .01  +  .00=  .04 

R(Mission)  =  e'  04  =  0.9608 

Underlying  these  estimates  are  assumptions 
that  the  component  is  described  by  the  ex¬ 
ponential  failure  law.  and  that  it  is  permissible 
to  add  the  failure  rates,  (i.e.,  the  failure  rates 
in  each  environment  and  test  state  are  inde¬ 
pendent).  The  assumptions  are  set  forth  and 
fully  discussed  in  [10]. 

The  method  also  derives  lower  reliability 
confidence  bounds  (LRCB)  for  components 
(and  series-parallel  groups  of  components) 
using  an  approximate  method  consistent  with 
the  mixed  censoring,  one-component-at-a- 
time  method  of  testing  (see  Section  7). 


In  reliability  applications.  0  is  an  unknown 
parameter  of  a  failure  model  or  a  function  of 
the  unknown  parameter,  such  as  reliability 
itself.  Pr(0)  is  some  prior  probabilistic  degree 
of  belief  about  a  value  or  a  set  of  values  of 
the  parameter  0,  u  represents  a  statistical  sum¬ 
mary  of  failure  data  from  a  test,  such  as  a 
combination  of  ordered  test  failure  times 
t, ,  t2,  t3,  .  .  .  t„,  or  the  number  of  failures  x 
among  n  units  tested.  Pr(u|0)  is  the  condition¬ 
al  probability  that  u  is  observed,  given  a  par¬ 
ticular  value  of  0.  Pr(u)  is  the  unconditional 
probability  of  u,  and  Pf(0|u)  is  the  posterior 
probability  of  0  given  that  u  has  occurred. 

Although  Bayes’  Theorem  can  be  used  as 
stated  with  prior  probabilities  ascribed  to 
discrete  values  of  0,  in  many  cases  of  inter¬ 
est  prior  probabilities  are  described  as  a  con¬ 
tinuous  PDF  g(0)  over  all  possible  0. 
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The  form  taken  by  Bayes’  formula  when 
u  =  (t, ,  tj ,  .  .  .  tn )  is  a  set  of  times  to  failure, 
for  instance  as  in  (11: 

g(0|u)=  f(u|0)g(0)/f(u)  (5-62) 

where:  f(u|0)  is  the  joint  (PDF)  of  a  sample 
of  size  N  from  f(t|0),  the  sampling  PDF  of 
the  random  variable  t  (time  to  failure).  g(0) 
is  the  prior  PDF  on  0,  also  called  the  mixing 
or  the  compounding  PDF.  f(u)  is  the  joint 
density  of_the  sample  observations  and  is 
equal  to  /  “  f(u,0)d0.  g(0|u)  is  the  posterior 
PDF  of  6  and  is  equal  to  f(u,0)/f(u). 

5.4.6. 1  Priors  and  Posteriors  for  Reliability 
Model  Parameters 


5.4.6.3  The  Uniform  Prior 

Selection  of  a  uniform  prior  on  R,  ex- 


pressed  as: 

1 

0<  R <  1 

g(R)  - 

(5-64) 

1° 

elsewhere 

presumes  complete  ignorance,  that  is,  that 
nothing  is  known  a  priori  about  R,  so  R  can 
assume  any  value  between  0  and  1  with  equal 
probability. 

In  this  formulation,  the  sampling  distribu¬ 
tion  of  the  variable  y  is: 

fly,,  •••yniR)  -  R*d-R)n  * 


Selection  and  justification  of  a  prior  is 
often  difficult.  In  some  cases  selection  has 
been  based  on  convenience,  as  in  the  case  of 
so-called  “conjugate  priors”,  distributions 
such  that  prior  and  posterior  have  the  same 
functional  form.  Ideally,  a  prior  should  be 
selected  on  the  basis  of  test  data  which,  if 
sufficient,  permits  inferences  about  the 
parameters).  When  such  data  are  scarce,  more 
or  less  arbitrary  judgments  must  be  made 
about  the  prior  if  a  Bayesian  method  is  to  be 
used.  This  sometimes  leads  to  the  adoption 
of  a  “fiat  or  uniform  prior”,  which  embodies 
the  concept  of  minimum  information  about 
values  the  parameter  can  assume. 

5.4.6. 2  The  Bayesian  Binomial  Failure 
Model.  Bernoulli  Sampling 

The  Bayesian  binomial  model  corresponds 
to  the  frequentist  binomial  model  where  n 
units  are  tested  and  x  defectives  are  found  in 
the  lot  (Bernoulli  Sampling).  Each  unit  tested 
in  the  Bayesian  binomial  model  has  a  samp¬ 
ling  distribution; 

f(yiR)  =  Ry  ( 1-R)'  y,  0<  R <  I, y  =  (0,1)  ( 5-63) 

where  y  =  0  indicates  a  defective  unit,  and 
y  =  1  a  non-defective  unit.  In  the  Bayesian 
binomial  model  R  is  not  considered  fixed,  but 
a  random  variable,  assigned  a  prior  distribu¬ 
tion  g(R).  Several  choices  of  a  prior  are  avail¬ 
able.  Additional  information  is  given  in  |9|. 


where  s  is  the  number  of  successes  out  of  N 
units  on  test  and  is  equal  to 


The  posterior  pdf  of  R  isg(R|y,,  •••  yn) 
which  is  a  beta  distribution  with  parameters 
p  =  s+I  and  Q  =  n-s+1  and  is  equal  to: 

g(R)yi.-“  y„) =  p  (s+  ]  j  p  (J.S.  I )  R‘  ( 1  'R)M  < 5'65 ) 

The  posterior  mean  E|R|y,,  y2,  •••  ynl 
is  the  mean  of  the  beta  distribution,  which 
yields  a  posterior  estimate  for  R,  ft  =  (s+1)/ 
(n+2  or  It  =  (n-x+l)/(n+2).  Notice  that  the 
prior  mean  for  R  was  E(R]  =  1/2,  and  that 
the  posterior  estimate  of  R  reevaluates  the 
mean  on  the  basis  of  the  observations. 

5.4.6.4  Truncated  Uniform  Prior 

For  a  truncated  uniform  prior  it  is  assum¬ 
ed  that  there  is  a  basis  for  believing  that  the 
component  reliability  R  is  at  least  as  large  as 
R0.  but  that  it  can  assume  any  value  from  R0 
to  1  with  equal  likelihood.  Mathematically, 
the  Prior  on  R  is: 


g(R)  = 


.  R0  >  0,  R0  <  R  <  1 


(5-66) 


0,  elsewhere 
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The  mean  of  the  posterior  PDF  of  R  yields:  A.  Prior  and  Posterior  Data 


R*E  j 


RiyP 


H_  (t*2,  n-*+l) 

Ko 

1-1.  (s+l.n-i+1) 

*« 

(5-67) 


where  s  is  the  number  of  successes  among  n 
units  on  test,  and  1R  is  the  cumulative  beta 
distribution  evaluated  at  R0. 

5.4.6. 5  The  Beta  Prior 


The  uniform  prior  and  truncated  uniform 
prior  on  R  are  less  often  applied  than  the  beta 
prior,  which  has  two  advantages.  The  beta 
distribution  has  two  extra  fixed  parameters, 
8  and  m,  which  allow  a  greater  flexibility  in 
fitting  test  data,  and  the  beta  PDF  is  the  con¬ 
jugate  for  binomial  test  data,  i.e.,  the  poster¬ 
ior  PDF  on  R  is  also  a  beta  distribution.  The 
prior  is: 


r(S+m) 

g(R,8,m)  = 


R8  ‘(i-R)"-' 


8,  m  >  0,  0  <  R  <  1 


(5-68) 


The  data,  obtained  for  an  ejector  gas  gen¬ 
erator,  were: 


Prior  number  of  successes: 
Prior  number  of  failures: 
Number  of  test  successes: 
Number  of  test  failures: 


8  -  3500 
m  *  1.30 
s  *  90 
x  *0.00 


B.  Calculations 

s'  =  8  +  s  =  3500  +  90  =  3590 
x’  =  m  +  x=  1.30  +  0.00=  1.30 


( 1 )  Reliability  Point  Estimates: 
(a)  Prior: 


8  3500 

8+m  3500+  1.30 


=  .99963 


(b)  Posterior: 


3590 

3590+  1.30 


=  .99964 


The  posterior  is: 

rtB+m+n)  „ 

-  x»’-  iteiSSJ  R&,‘ (I  Rr 

(5-69) 

8.  m  >  0.  0  <  R  <  1 

where  s  is  the  number  of  successes  out  of  n 
units  tested.  The  posterior  mean  yields  an 
estimate  of  R: 

R  -  6+5 

"  8+m+n 


(2)  Reliability  one-sided  confidence  esti¬ 
mates 

(a)  Prior  PDF  at  a  =  .99: 

y.R, 

»(3SOO  l )/i.pt(  1.30-1  )dP.  j.  99 

0 

solving  for  RL,  RL  *  .99852 
At  a  *  .95: 


Example.  Bayesian  Method 

In  the  following  example,  R  is  rewritten 
in  the  form 

s  8+s  _  8+s 

8+s+m+n-s  ~  8+m+s+x 


r  (3501.3) 
r(3500)  r  d.3) 


/: 


L  p(3500-l)(|+)(t  1-.95 


solving  for  RL,  RL  =  .99900 
At  a*  .80: 


where  x  is  the^number  of  failures.  In  this  ex-  — Li — LJ —  f  L  p<35oo.|)(,.P)<i.30-l)dp«  j..ao 

pression  for  8  represents  the  number  of  r  (3500)  r  (1.3) 
prior  successes,  and  m  the  number  of  prior 
failures. 


solving  for  RL,  RL  *  .99942 
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(b)  Posterior  PDF: 


At  a  =  .99: 

r  (3591.30) 
r  (3590)  r  (1.30) 


p(3590-l)^].p^(1.30-l)ljp  =  I-.99 


solving  for  RL ,  RL  =  .99855 
At  a  =.95: 


In  1 42,35 )  a  case  is  made  for  the  employ¬ 
ment  of  discrete  empirical  priors,  where  prior 
probability  assignments  are  made  for  discrete 
ranges  of  reliability.  The  posterior  reliability 
in  this  case  is  also  discrete  and  empirical, 
that  is,  both  prior  and  posterior  estimates 
appear  as  tabulated  functions  rather  than 
as  analytic  expressions.  In  many  cases  poster¬ 
ior  reliability  estimates  are  more  realistic  with 
empirical  priors  than  when  beta  priors  are 
used  with  binomial  component  test  data. 


r  (3591.30)  /*RLp(3590-l)(|.pj(1.30-l)<ip  i  1..95 

!•  (3590)  1  (1.30)  J0 

solving  for  RL,  RL  =  .99902 
At  a  =  .80: 

_ r  (3591.30) /  Lp(3S90-l)a.p,(l.30-!)dP=  ,..g0 

r  (3590)  1  (1.30)  J0  '  ' 

solving  for  RL,  RL  =  .99944 


5.4.6. 7  The  Pascal  Process 

This  case  corresponds  to  the  negative  bi¬ 
nomial  model.  The  test  sampling  process  is 
Pascal  and  consists  in  testing  units  one  after 
the  other  until  x  =  n  -  s  defectives  have  been 
found.  The  number  of  units  tested  n,  is  a 
random  variable,  while  x  is  a  constant  selected 
before  testing.  With  a  beta  prior  on  R  the  pos¬ 
terior  becomes: 


C.  Calculation  Summary 


Prior  Posterior 


,P|  »_  r,t+m+-nj)  pf+i-i ..  p.m+riyj-i 

‘  i“;+$T':(m+rn  -i)  R  <1R) 

(5-70) 


1.  Reliability  Point 

Estimate  .99963 

2.  Reliability  One-Sided 
Lower  Confidence 
Estimates: 

a =99  .99852 

a =95  .99900 

a  =.80  .99942 


.99964 


.99855 

.99902 

.99944 


5.4.6.6  Empirical  Priors 

Certain  objections  to  beta  priors  on  the 
reliability  of  binomial  components  have  led 
to  the  use  of  empirical  priors.  Strong  beta 
priors  used  in  conjunction  with  binomial 
component  data  can  lead  to  unrealistic  poster¬ 
ior  estimates  of  reliability,  when  a  string  of 
failures  occurs.  Fot  instance,  1351  shows  that 
if  99  prior  successes  with  1 00  prior  trials  are 
assumed  for  the  prior  with  a  corresponding 
estimate  R  =  0.99  for  prior  reliability,  a 
string  of  10  successive  failures  would  yield  a 
posterior  estimate  K  =  0.90.  This  high  value 
of  component  reliability  after  such  a  string 
of  failures  is  hardly  credible.  It  is  also  shown 
that  use  of  very  weak  priors  gives  unrealistic 
LRCB's  in  the  no  failure  case. 


which  is  also  a  beta  PDF.  An  estimate  of  R 
is  the  posterior  mean: 


C  +  m  +  Erij 


(5-71) 


In  the  above  equations,  s  is  the  number  of  re¬ 
quired  successes  or  survivals,  nj  represents  the 
number  of  units  actually  tested  to  obtain  one 
survival  for  each  i,  and  £n,  is 

n 

t  n(  =  n. 
i 


As  an  example,  assume  that  a  total  of  15 
survivals  is  required  and  that  a  sequence  of 
defectives  d,  and  nondefectives  <T,  is  as  fol¬ 
lows: 


d  d  d_  d  ddddd  dddd 

ni  n2  n3  n4  nJ  n6  n7  ^>8  "9 

d  d  d  d  d  d 
nio  nll  nl2  nl3  "|4  "is 
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then: 

n2  =  n4  =  ns*n6*ns*n,  =  n10  =  nn  = 
nu  “nl3“nM*nIS  =  1 


In  either  censoring  case,  an  estimate  of  X  is 
given  by  the  posterior  mean: 


X 


±±2L 

r  +  T 


(5-74) 


n,  «n3  =  2 
n7  *  3 

If  8  =  m  =  0,  then 


-  s  _  15 
‘  In;  19" 


0.789. 


This  point  estimate  of  R  is  the  same  as  for  the 
corresponding  frequentist  case.  If,  on  the 
other  hand,  8  =  1,  m  =  2,  where  8  and  m 
embody  some  prior  knowledge  of  R,  using 
equation  5-7 1 ,  we  have : 


_  *  +  s 

8  +  m  +  : 


1  ±  15 
"3+19" 


0.727 


5.4.6. 8  Exponential  Reliability  Model, 
Poisson  Process 


A  Poisson  process  is  characterized  by  con¬ 
stant  failure  rate  X.  If  the  prior  on  X  is  select¬ 
ed  to  be  Gamma  with  parameters  r  and  0, 
then 


g(X,r,0)  = 


T*-x>0 

(5-72) 

0  elsewhere 


Then,  the  posterior  is  also  Gamma,  regardless 
of  whether  the  test  is  Type  1  Censored  (fixed 
test  time  T)  or  Type  II  Censored  (fixed  num¬ 
ber  of  failures  x). 

Letting  T  represent  the  fixed  testing  time 
in  Type  I  censoring,  and 


I) 

T  =  I 

i=  i 


A  problem  with  prior  data,  such  as  4>  and 
r,  and  actually  observed  data,  such  as  x  and 
T,  is  that  they  may  be  incompatible.  A  test 
of  hypothesis  at  a  significance  level  a  (see 
Appendix  C)  has  been  devised  for  the  accept¬ 
ance  or  rejection  of  the  prior  =  4>/T  after 
x  and  T  have  been  observed.  This  test  is: 


x*  (2x,  a/2)  (2x+2,  l-ot/2) 

21  21 

(5-75) 

If  lies  between  the  indicated  xJ<  >/2T 
limits,  it  is  accepted,  otherwise  it  is  rejected. 
Only  after  this  test  has  been  performed  and 
has  been  accepted  is  it  permissible  to  com¬ 
pute  an  upper  limit  on  failure  rate,  using  prior 
data,  as  follows: 


- 


_  x  (2[#+x]+2,  1-q) 
2(r+T) 


(5-76) 


Also: 


Rl  =  e  (5-77) 

Example,  Bayesian  Process,  Poisson: 


Assume  that  the  prior  failure  rate  has  been 
estimated  to  be  =  0/r  =  2/600  =  .00333. 
Assume  also  that  the  failure  data  observed 
later  was  the  same  as  given  in  the  example 
of  §  5.4. 1.1.1  with  x  =  8  failures  and  T  = 
555  hours.  Selecting  a-  0.2,  we  have: 


X*  (16,0.l)^^xI  (18,  .9) 

~ uio  iTTo 


(5-78) 


in  Type  II  censoring  where  the  t,’s  represent 
the  time  of  failure  of  the  ith  unit,  and  x  is  the 
number  of  failures,  the  posterior  on  X  is  given 
by: 

g(Xlx)  =  g(XID  *  e  x<T+T> 

(5-73) 


or: 

0.00839<X<  0.02341 

Since  T0  is  outside  the  interval,  the  prior  data 
and  the  test  data  are  deemed  to  come  from 
separate  populations  and  should  not  be  com¬ 
bined.  Both  the  prior  and  the  test  data  should 
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be  examined  for  error.  Unless  an  error  is 
found  in  the  test,  normal  practice  is  to  dis¬ 
card  the  prior  and  accept  the  test  data. 

S.4.6.9  Prior  Strength 

The  fixed  parameters  of  a  prior  distribu¬ 
tion  embody  its  strength.  For  instance,  the 
posterior  estimate  of  X,  T,  has  been  given  as 
(4+x)/(r+T).  The  corresponding  prior  *X0  is 
readily  seen  to  be  p/r  from  the  form  of  the 
corrugate  gamma. 

4  can  be  construed  as  pseudo-prior  test 
failures,  and  r  as  pseudo-prior  test  time.  Then 
the  prior  is  weak  when  small  values  of  4  and 
r  are  selected,  and  strong  when  large  values 
of  these  fixed  parameters  are  used.  Strong 
priors  are  harder  to  “wash  out”  as  test  data 
accrue. 

Predicted  reliability  can  be  used  as  a  basis 
for  establishing  a  prior  best  estimate.  It  fixes 
the  ratio  0/r  and  the  selection  of  r  establishes 
prior  strength  and  defines  the  entire-  prior 
distribution.  If  r  is  large,  the  prior  is  strong 
and  relatively  difficult  for  test  data  to  mod¬ 
ify.  Small  r  yields  a  weak  prior  easily  dis¬ 
counted  by  test  data. 

Figure  5-28  presents  informal  rules  for 
choosing  r.  They  have  been  found  by  ex¬ 
perience  to  yield  acceptable  results  for  many 
types  of  subsystems.  The  factor  M  is  total 
expected  program  test  time  expressed  in 
equivalent  missions.  The  factor  t  is  total 
historical  test  time. 

To  use  categories  1-4  of  figure  5-28,  it  is 
necessary  to  analyze  the  proposed  test  pro¬ 
gram  for  the  component  under  study  to  deter¬ 
mine  the  expected  number  of  equivalent  mis¬ 
sions,  M,  that  will  be  derived  from  testing. 
After  this  number  has  been  determined,  select 
the  category  that  best  characterizes  the  infor¬ 
mation  available  for  the  component.  The 
number  of  expected  missions  is  then  multi¬ 
plied  by  the  accompanying  fraction  to  obtain 
the  number  of  prior  missions  r.  The  failure 
rate  that  has  been  predicted  becomes  the  esti¬ 
mated  prior  failure  rateT0.  For  categories  1-4 
of  the  figure,  the  product  of  the  multiplier 
and  M  provides  the  number  of  prior  missions 
r.  In  categories  5-8  of  the  table,  prior  data 
consist  of  a  total  number  of  equivalent 
missions  r.  This  completes  the  selection  of 
the  prior  parameters  4  and  r;  T0  =  p/r,  so 


that  9,  which  is  defined  as  the  number  of 
prior  failures,  is  uniquely  deteimined  when  r 
is  selected.  It  should  be  noted  that  4  need 
not  be  a  whole  number. 

Two  key  points  should  be  noted  when 
using  these  techniques  to  set  prior  parameters. 
When  the  prior  information  is  somewhat  sub¬ 
jective  (Categories  1-4)  the  weight  it  will  bear 
in  the  final  joint  estimate  will  vary  from  1/32 
to  at  most  1/4  the  weight  of  objective  test 
data.  This  is  a  control  against  overemphasis  of 
a  prior  judgement  that  may  be  somewhat  in 
error.  When  frequency  data  are  used  fully, 
(Category  8),  the  Bayesian  framework,  al¬ 
though  still  formally  employed,  is  equivalent 
to  treating  prior  information  as  though  it 
were  early  objective  test  information  on  the 
component  in  question. 


Category 

Description 

r 

1 

Informed  Qualitative  Judgement 

1/32M 

2 

Based  Primarily  upon  Generic 

Part  Handbook  Data 

1/16M 

3 

Partly  based  upon  Generic  Part 
Handbook  Data; partly  upon 
Similar  Part  Handbook  Data 

1/8M 

4 

Based  Primarily  upon  Similar 

Part  Handbook  Data 

1/4M 

5 

Frequency  Data  Solely:  Similar 
Parts,  Similar  Applications 

0.1 1 

6 

Frequency  Data  Solely:  Identical 
Parts,  Similar  Applications 

0.2t 

7 

Frequency  Data  Solely:  Similar 
Parts,  Identical  Applications 

0.3t 

8 

Frequency  Data  Solely:  Identical 
Parts,  Identical  Applications 

l.Ot 

Figure  5-28.  Decision  Rules  for  Estimating  Prior 
Strength 

5.4.6.10  Point  and  Interval  Estimation  of 
Bayesian  Parameter 

In  Bayesian  analysis,  it  is  assumed  that 
HO): 

a)  A  prior  PDF  on  0:  g(0),  has  been  select¬ 
ed.  It  is  desired  to  estimate  the  value  of  0 
from  the  observations  (t, ,  t2,  •••  t.)  each 
drawn  from  ffu|0).  Many  different  estimates 
exist,  but  a  commonly  used  method  to  obtain 
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a  "best”  estimate  is  to  require  that  the  loss 
function  (fr-d )J  be  minimized.  This  leads  to 
a  point  estimate  of  6,  T,  by  the  means  of  the 
posterior  PDF.  Therefore: 

t=  E(0,  u)  =  f  “  0g(0|u)  d0  (5-70) 

To  obtain  an  upper  bound  9„  or  a  lower 
bound  0L  for  6  at  confidence  1  -  a,  it  is  only 
necessary  to  calculate : 

/“g(*|u)<W  =  aorA  g(0|u) dO  =  a  (5-71) 

°u 

Other  interval  estimates  for  6  are  also  easy  to 
obtain  by  similar  integrals.  Once  the  posterior 
is  known  in  Bayesian  methods  that  use  dis¬ 
crete  empirical  priors,  point  and  interval  esti¬ 
mates  of  the  parameter  of  a  failure  model  can 
be  obtained  directly  from  the  discrete  pos¬ 
terior  parameter  histogram. 

b)  One  random  observation  is  obtained 
from  the  prior  distribution  on  fr. 

c)  A  random  sample  is  drawn  from  f(u|0). 

Figure  5-29  provides  a  summary  chart  of 
Bayesian  methods  for  assessing  R,  MTBF, 
failure  rate,  and  other  distributional  param¬ 
eters. 

5.4.6. 1 1  Assessing  Validity  of  Bayesian 

Methods  in  Presence  of  Erroneous 
Priors 

In  (43]  a  study  was  performed  to  show  the 
influence  of  erroneous  prior  point  estimate 
and  strength  on  Bayesian  reliability  estimates. 
The  errors  studied  in  the  model  were:  a)  in¬ 
correct  distributional  form  of  the  prior,  b) 
incorrect  mean  for  the  prior,  and  c)  incorrect 
number  of  failures,  set  to  be  zero  or  three 
times  the  expected  number. 

The  errors  were  investigated  by  Monte 
Carlo  simulation,  using  weak,  moderate  and 
strong  priors.  It  was  concluded  that  when  the 
test  data  distribution  is  correctly  identified, 
but  the  strength  of  the  prior  is  incorrectly 
chosen  (usually  too  strong),  a  Bayesian  pre¬ 
diction  may  seriously  overstate  reliability 
early  in  the  program.  The  degree  of  over¬ 
statement  is  both  a  function  of  the  size  of  the 
error  and  the  assigned  strength  or  degree  of 
belief  in  that  prior.  Error  in  the  estimated 
mean  is  particularly  important.  Without 


reasonable  assurance  that  a  prior,  even  if 
weak,  is  positioned  close  to  the  subsequent 
performance  of  the  equipment  under  test, 
Bayesian  methods  are  likely  to  give  misleading 
results.  Great  caution  should  therefore  be 
exercised  in  utilizing  Bayesian  methods.  It 
is  important,  in  particular,  not  to  use  Bayes¬ 
ian  methods  as  an  excuse  for  not  collecting 
data.  The  penalty  is,  of  course,  that  the  prior 
estimate  will  be  given  too  much  final  cred¬ 
ence,  when  in  fact,  it  should  be  used  only  as 
an  initial  educated  guess  to  be  confirmed  or 
denied  in  due  time  by  newly  collected  data. 

5.4.6.12  Summary  Chart  for  Bayesian 
Methods 

Figure  5-29  provides  a  summary  chart  of 
Bayesian  methods  for  assessing  R,  MTBF, 
failure  rate,  and  other  distributional  param¬ 
eters. 

An  explanation  of  the  headings  of  figure 
5-29  is: 


Heading 

Meaning 

Reference / 
Paragraph 

Exhibit  # 

Refers  to  corresponding 
classical  models 

Figure  5-11 

Item  Classification 

Refers  to  corresponding 
classical  models 

Figure  5-11 
and  §  5.4.1 

Failure/ Reliability 
Model 

Refers  to  corresponding 
classical  models 

Figure  5-1 1 
and  §  5.4.1 

Raw  Data  & 

Type  of  Teat 

Refers  to  corresponding 
classical  models 

Figure  5-11 
and  §  5.4.1 

Assumed  Prior 

Prior  Distribution 

$  5.4.6 

Posterior 

Posterior  Distribution 

§  5.4.6 

Point  Estimates 
(Posterior  Mean) 

Point  Estimates 

$  5.4b 

One-Sided  Con- 

Interval  Estimates 

$  5.4.6 

ftdencc  Limits 

5.4.7  Adjustments  to  Reliability  Estimation, 
Derating  and  Uprating 

Systems  that  operate  in  space,  undersea  or 
in  other  extreme  environments  may  be  tested 
in  ground  test  environments  which  may  not 
be  able  to  simulate  the  mission  environment 
realistically.  For  example,  it  may  not  be  pos¬ 
sible  to  simulate  the  heating  and  stresses  ex¬ 
perienced  by  a  re-entry  vehicle  at  maximum 
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deceleration.  Derating  has  been  used  to 
address  such  problems.  As  stated  in  [27] 
“Data  for  the  generation  of  derating  curves 
come  from  several  sources,  such  as  life  tests 
of  component  parts,  system  life  tests,  or  field 
operation.”  A  family  of  curves  of  failure  rate 
versus  thermal  stress  is  derived,  with  electrical 
stress  as  a  parameter  of  each  curve.  Given 
mission  thermal  and  electrical  stresses,  failure 
rates  can  be  derived  from  the  curves. 

For  state-of-the-art  systems,  derating  curves 
for  particularly  stressed  environments,  such  as 
re-entry,  may  not  exist,  or  may  be  based  on 
engineering  studies  or  a  few  readings  of 
thermal  and  stress  sensors  in  flight  tests. 
Under  these  conditions,  the  curves  may  be 
highly  conjectural  and  subject  to  appreciable 
errors.  Moreover,  the  failure  process  for  stress¬ 
es  outside  the  range  of  observation  may  be 
known  only  from  a  set  of  possibly  inaccurate 
curves. 

In  [281  k-factors  are  introduced  for  de¬ 
rating  components.  They  are  defined  as 
weighting  factors  used  to  convert  time  in 
ground  test  into  pseudo-flight  test  time.  To 
make  the  k-factors  as  realistic  as  possible,  it  is 
necessary  to  have  test  data  from  both  ground 
tests  and  flight  tests.  The  accuracy  of  the 
derating  method  is  dependent  on  the  amount 
of  flight  test  failure  data  available.  See  [30], 
[311,  [32]  for  applications  of  k-factors  to 
reliability  estimation. 

Uprating  may  apply  when  systems  are  pur¬ 
posely  stressed  in  testing  beyond  the  nominal 
stresses  for  which  reliability  indices  are  sought 
but  not  beyond  system  design  limits.  Accel¬ 
erated  testing  is  discussed  in  [91  with  ex¬ 
amples  of  models  available  in  the  literature  to 
relate  stress  factors  to  certain  reliability  mea¬ 
sures.  The  models  presented  are: 

a.  Power  Rule  Model 

The  Power  Rule  Model  [9]  can  be  derived 
via  considerations  of  kinetic  theory  and  acti¬ 
vation  energy;  where  the  item  MTBF  is  in¬ 
versely  proportional  to  the  Pth  power  of  ap¬ 
plied  stress: 

8  =-|r  (5-72) 

where  K  is  a  constant  to  be  estimated,  P  is 
the  power  of  applied  stress,  also  to  be  esti¬ 
mated  and  S  is  stress.  Point  and  interval  esti¬ 
mates  of  K  and  P  are  obtained  and  used  to 
make  inferences  about  6  at  mission  stress. 


b.  Arrhenius  Reaction  Rate  Model 

The  Arrhenius  Reaction  Rate  Model  [9) 
has  applicability  to  semiconductor  materials. 
It  is  given  by: 

X  =  e<A-B/T>  (5-73) 

where  X  is  the  hazard  rate  at  absolute  temper¬ 
ature  T,  and  A  and  B  are  empirical  parameters 
to  be  estimated  from  multiple  sets  of  test 
data. 

c.  Eyring  Model  for  a  Single  Stress  [91 . 

In  this  model,  the  hazard  rate,  X,  from  the 
Arrhenius  Reaction  Rate  Model  [9]  is  related 
to  operating  temperature  T  by: 

X  =  Te<A  b/t)  (5-74) 

d.  Generalized  Eyring  Model  [91 

This  model  is  applicable  to  items  subjected 
to  two  types  of  stresses,  a  thermal  stress,  T, 
and  a  non-thermal  one,  such  as  an  electric 
voltage,  V.  The  hazard  rate,  X,  is  then  given 
by: 

X  »  ATdcv  +  (dv-b)/kt|  (5.75) 

A,  B,  C,  and  D  are  empirical  constants  to  be 
estimated. 

Use  of  these  models  to  develop  derating 
and  uprating  data  has  often  proven  difficult 
[33]. 

5.5  REFERENCES 

1.  Ascher,  H.  &  Feingold,  H.,  Is  There  Re¬ 
pair  After  Failure?,  Proceedings  1978 
Annual  Reliability  and  Maintainability 
Symposium,  pp.  190-197. 

2.  Cox,  D.  and  Lewis,  P.,  The  Statistical 
Analysis  of  Series  of  Events,  Bames  and 
Nobles,  New  York,  1966. 

3.  Kendall,  M.  G.  and  Stuart,  A..  The  Ad¬ 
vanced  Theory  of  Statistics.  Three- 
Volume  Edition,  Charles  Griffin  and  Co., 
1970. 

4.  Bury,  K.  V.,  Statistical  Models  in  Ap¬ 
plied  Science,  John  Wiley  and  Sons, 
1975. 

5.  Kapur,  K.  C.  and  Lamberson,  L  R„ 
Reliability  in  Engineering  Design,  John 
Wiley  and  Sons,  1977. 


5-57 


NAVSEAOD  29304B 


6.  Hartvigsen,  D.,  Private  Communication. 

7.  Jackson,  L.  L.,  Simple  Way  to  Predict 
Product  Reliability,  Machine  Design 
(Graphical  method  for  Weibull  point  and 
interval  estimation),  8/9/79. 

8.  Lilliefors,  H.  W..  On  the  Kolmogorov- 
Smimov  Test  for  the  Exponential  Dis¬ 
tribution  with  Mean  Unknown,  Amer¬ 
ican  Statistical  Association  Journal,  Vol. 
64,  pp.  387-389,  1969. 

9.  Mann,  N.  R.,  Schafer,  R.  E.,  and 
Singpurwalla,  N.  D.,  Methods  for  Statis¬ 
tical  Analysis  of  Reliability  and  Life 
Data,  J.  Wiley  and  Sons,  1974. 

10.  NAVORD  OD  29304/ Addendum-Statis¬ 
tical  Exposition  of  the  Guide  Manual  for 
Reliability  Measurement  Program,  11/ 

1 5/67. 

11.  Hald,  A.,  Statistical  Theory  with  Engi¬ 
neering  Applications,  John  Wiley,  1957. 

12.  Lloyd,  D.  K.,  and  Lipow,  M.,  Reliability: 
Management,  Methods  and  Mathematics, 
Prentice-Hall,  1 962,  Second  Edition  pub¬ 
lished  by  the  authors,  1977. 

1 3.  CRC  Handbook  of  Tables  for  Probability 
and  Statistics,  2nd  Edition,  The  Chem¬ 
ical  Rubber  Company,  1968. 

14.  Demskey,  S.,  Systems  Estimation  from 
Variables  ..nd/or  Attribute  Parameters, 
General  Electric  T1S  No.  68SD264, 
1964. 

15.  Maintainability  Design  Criteria  Hand¬ 
book  for  Designers  of  Shipboard  Elec¬ 
tronic  Equipment,  Bureau  of  Ships, 
NAVSH1PS  94324,  1965. 

16.  Lilliefors.  H.  W.,  On  the  Kolmogorov- 
Smimov  Test  for  Normality  with  Mean 
and  Variance  Unknown,  American  Sta¬ 
tistical  Association  Journal,  Vol.  62,  pp. 
399-402,  1967. 

17.  Nelson,  W  ,  Hazard  Plot  Analysis  of 
Incomplete  Failure  Data,  Proc.  of  1969 
Annual  Symposium  on  Reliability,  Jan¬ 
uary  1969,  pp.  391-403. 

18.  Lee,  E.  T.,  Statistical  Methods  for  Sur¬ 
vival  Data  Analysis,  Lifetime  Learning 
Publications.  Belmont,  CA,  1980. 


19.  Nelson,  W.,  Theory  and  Applications  of 
Hazard  Plotting  for  Censored  Failure 
Data,  Technometrics,  Vol.  14,  No.  4, 
pp.  945-966. 

20.  HP  Books:  Graph  Paper  from  your 
Copier. 

21.  TEAM,  Box  25,  Tamworth,  New  Hamp¬ 
shire  03886. 

22.  Liittschwager,  J.  M.,  Result  of  a  Gamma 
Lognormal  and  Weibull  Sampling  Experi¬ 
ment,  Industrial  Quality  Control,  pp. 
124-127,  9/1965. 

23.  Cable,  C.  W.  and  Virene,  E.  P.,  Struc¬ 
tural  Reliability  with  Normally  Dis¬ 
tributed  Static,  and  Dynamic  Loads  and 
Strength,  1967  Annual  Symposium  on 
Reliability,  1/1967. 

24.  Shooman,  M.  L.,  Probabilistic  Reliabil¬ 
ity:  An  Engineering  Approach,  McGraw- 
Hill,  1968. 

25.  Calvin,  T.  W.,  Modeling  the  Bathtub 
Curve,  Proceedings  of  1 973  Annual  Reli¬ 
ability  and  Maintainability  Symposium, 
pp.  477-482,  1973. 

26  Lieberman,  G.  J.,  The  Status  and  Impact 
of  Reliability  Methodology,  Naval  Re¬ 
search  Logistics  Quarterly,  Vol.  16, 
No.  1,  1969. 

27.  Arinc  Research  Corporation,  Reliability 
Engineering,  Prentice-Hall,  1 964. 

28.  Lockheed,  D056702B  C-3  Reliability 
Program  Summary  Document,  Lockheed 
Missile  and  Space  Company.  Jan,  1982. 

29.  Marin,  N.  R.  and  Fertig,  K.  W.,  Simpli¬ 
fied  Efficient  Point  and  Interval  Esti¬ 
mators  for  Weibull  Parameters,  Techno- 
metrics,  Vol.  17,  No.  3,  8/1975. 

30.  Kern,  G.  A.,  et  al.,  Operational  In¬ 
fluences  on  Reliability,  Hughes  Air¬ 
craft  Co.,  RADC-TR-76-294,  9/1976. 

31.  Tomsky,  J.  L.,  Chow.  T.  R.,  and  Schiller, 
L.  D.,  System  Reliability  Estimation 
from  Several  Data  Sets.  Proceedings 
1976  Annual  Reliability  and  Maintain¬ 
ability  Symposium. 

32.  Shelly,  B.  F.  and  Stovall,  F.  A.,  Field- 
Laboratory  Reliability  Relationship,  Pro¬ 
ceedings  1976,  Annual  Reliability  and 
Maintainability  Symposium. 


5-58 


NAVSEAOD  29304B 


33.  Singpurwalla,  N.  D.,  A  Problem  in  Accel- 
erated  Life  Testing,  NASA,  Voi.  66, 
No.  336,  12/1971. 

34.  Johns,  M.  V.  and  Lieberman,  G.  J.,  An 
Exact  Asymptotically  Efficient  Con¬ 
fidence  Bound  for  Reliability  in  the  Case 
of  the  Weibull  Distribution,  Techno- 
metrics,  Vol.  8,  No.  1,  February,  1966. 

35.  MacFarland,  W.  J.,  Bayes’  Equation, 
Reliability  and  Multiple  Hypothesis  Test¬ 
ing,  IEEE  Trans,  on  Rel.,  Vol.  R-21, 
No.  3,8/1972. 

36.  Findelstein,  J.  M„  Confidence  Bounds 
on  the  Parameters  of  the  Weibull  Pro¬ 
cess,  Technometrics,  Vol.  18,  No.  1, 
February,  1976. 

37.  Meeker,  W.  Q.  and  Nelson,  W.,  Weibull 
Percentile  Estimates  and  Confidence 
Limits  from  Singly  Censored  Data  by 
Maximum  Likelihood,  IEEE  Transac¬ 
tions  on  Reliability,  Vol.  R-25,  No.  1, 
April,  1976. 

38.  Nelson,  W.,  Charts  for  Confidence 
Limits  and  Tests  for  Failure  Rates, 
Journal  of  Quality  Technology,  Vol.  4, 
No.  4,  October,  1972. 

39.  Thomas,  D.  R.  and  Grunkemeier,  G.  L., 
Confidence  Interval  Estimation  of  Sur¬ 
vival  Probabilities  for  Censored  Data, 
Journal  of  the  American  Statistical  Asso¬ 
ciation,  Vol.  70,  No.  352,  December, 
1975. 

40.  Welker,  E.  L.  and  Lipow,  M.,  Estimating 
the  Exponential  Failure  Rate  from  Data 
with  No  Failure  Events. 

41.  Shimi,  I.  and  Tsokos,  C.  P.,  The  Bayesian 
and  Non-Parametric  Approach  To  Reli¬ 
ability  Studies:  A  Survey  of  Recent 
Work,  1977. 


42.  MacFarland,  W.  J.,  Use  of  Bayes’  The¬ 
orem  in  its  Discrete  Formulation  for 
Reliability  Estimation  Purpose,  Proc. 
1968  Symposium  on  Reliability  and 
Maintainability,  p.  362  ff. 

43.  Evaluation  Associates,  Analysis  of  Bayes¬ 
ian  Reliability  Estimation;  Method  of 
NAVORD  OD  29304A,  Final  Report, 
1974. 

44.  Evaluation  Associates,  Measurement  and 
Forecast  of  Reliability  Growth  During 
Hardware  Development,  7/1/75. 

45.  Lipa,  T.  F.,  Reliability  Growth  Study, 
Hughes  Aircraft  Co.,  RADC-TR-75-253, 
October,  1975. 

46.  Mann,  N.  R.,  Fertig,  K.  W.,  and  Schever, 
E.  M.,  Tolerance  Bounds  and  a  New 
Goodness-of-Fit  for  Two-Parameter  Wei¬ 
bull  or  Extreme-Value  Distribution, 
WPAB;  Aeorspace  Research  Laborator¬ 
ies,  ARL  71-0077,  Dayton,  Ohio,  May 
1971. 

47.  Draft  MIL-STD-781D,  Reliability  Test¬ 
ing  for  Engineering  Development,  Quali¬ 
fication  and  Production”,  NESC,  12/ 
31/80. 

48.  Crow,  L.,  Private  Communication,  10/ 
81. 

49.  Bazovsky,  I.,  Reliability  Theory  and 
Practice,  Prentice-Hall,  1961. 

50.  Grant,  E.  L„  Statistical  Quality  Con¬ 
trol.  3rd  Ed.  McGraw-Hill  1904. 

51.  Owen,  D.  B.,  Factors  for  One-Sided 
Tolerance  Limits  and  for  Variables 
Sampling  Plans,  Sandia  Corporation 
Monograph,  SCR  607,  3/1963. 


5-59/5-60 


NAVSEA  OD  29304B 


Section  6 

SOFTWARE  EVALUATION 


This  section  examines  the  nature  of  soft¬ 
ware  errors  and  describes  methods  of  assess¬ 
ing  software  reliability  which  have  been 
tested  in  service.  Software  is  playing  an  in¬ 
creasingly  critical  role  in  military  and  in¬ 
dustrial  systems  and  software  errors  can  be 
the  cause  of  system  failures. 

6.1  DEFINmON  OF  SOFTWARE 

Software  has  been  defined!  1]  as:  written 
or  printed  data,  such  as  programs,  routines, 
and  symbolic  languages,  essential  to  the 
operation  of  computers. 

Another  more  general  definition! 2]  has 
been  quoted  in  [3]  as,  “Software  is  informa¬ 
tion  that  is:  a)  structured  with  logical  and 
functional  properties,  b)  treated  and  main¬ 
tained  in  various  forms  and  representations 
during  its  life  cycle,  c)  tailored  for  machine 
processing  in  its  fully  developed  state.” 

The  tangible  elements  on  which  software 
is  stored,  printed  or  displayed,  such  as  mag¬ 
netic  or  paper  tapes,  disks,  or  punched 
cards,  are  not  considered  software,  but  part 
of  the  system  hardware. 

The  distinction  between  intangible  soft¬ 
ware  and  the  tangible  media  on  which  it  is 
recorded  is  useful  for  purposes  of  reliability 
analysis  because  the  storage  media  are  amen¬ 
able  to  treatment  by  conventional  hardware 
reliability  methods,  while  the  intangible 
software  displays  characteristics  that  pre¬ 
clude  such  methods. 

6.2  SOFTWARE  FAILURES 

Software  does  not  break  down  as  hard¬ 
ware  components  do  when  operating  or 
stored  for  any  length  of  time.  Yet  software 
failures  in  operating  systems  often  occur. 
Since  software  failures  often  occur  randomly, 
and  in  order  to  understand  how  this  random 
occurrence  of  software  failures  takes  place, 


it  is  necessary  to  review  briefly  certain  char¬ 
acteristics  of  software  and  its  failure  modes, 
which  are  not  exhibited  to  the  same  extent 
by  hardware. 

A  major  difference  between  hardware 
and  software,  which  seems  at  first  to  pre¬ 
clude  a  probabilistic  definition  of  software 
reliability,  is  the  fact  that  software  is  un¬ 
changeable  during  repeated  operation,  in 
contrast  to  hardware  which  may  exhibit 
degradation  leading  to  failures  or  random 
operational  failures  best  described  in  proba¬ 
bilistic  terms. 

Software,  once  delivered  for  operation, 
does  not  degrade  due  to  wear  or  fatigue. 
But  whether  stored  on  tape,  disk  or  cards, 
software  command  and  control  and  appli¬ 
cation  programs  are  identical  to  themselves 
until  deliberately  changed.  Very  occasional¬ 
ly,  a  tape  or  disk  may  be  subjected  to  a  stray 
magnetic  field  which  may  damage  the  stored 
data  or  instructions,  but  by  the  intangible 
definition  of  software,  this  is  considered  a 
hardware  failure. 

Duplicate  software  systems  yield  identical 
results,  whether  correct  or  not,  when  op¬ 
erated  with  a  particular  set  of  inputs.  Thus, 
replication  does  not  confer  the  reliability 
benefits  of  redundancy  as  is  true  for  hard¬ 
ware,  and  there  is  no  component  variability 
to  contend  with.  Moreover,  physical  environ¬ 
ment  does  not  affect  the  performance  of  in¬ 
tangible  software. 

In  complex  programs  conditional  branch¬ 
ing  (IF,  GOTO,  THEN,  ELSE,  ...  in  FOR¬ 
TRAN,  PL/1,  or  ALGOL)  often  reflects 
the  whole  gamut  of  possible  decisions  which 
a  human  might  make  under  various  con¬ 
tingencies  in  an  operational  situation.  The 
loops  and  feedback  paths  in  even  the  simplest 
programs  lead  rapidly  to  astronomically 
large  numbers  of  contingencies,  which  cannot 
be  enumerated  or  checked  individually  by 
even  the  fastest  computers.  As  an  example, 
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Figure  6-1.  “Simple"  Logical  Flow  Chart 


the  simple  program  flow-chart  [4]  of  fig¬ 
ure  6-1  can  be  shown  to  contain  about 
1020  distinct  paths.  If  a  computer  could 
check  one  billion  paths  per  second,  well 
over  3,000  years  would  be  required  to  check 
the  program  exhaustively. 

When  operations  begin,  with  real,  as  op¬ 
posed  to  constructed,  input  data,  there  is  no 
way  to  know  precisely  which  of  the  many 
possible  logical  paths  will  be  exercised.  If 
the  logic  in  the  exercised  path  is  defective  at 
any  point,  a  software  failure  may  occur. 
Experienced  software  designers  recognize 
these  facts  and  seek  to  limit  the  effects  of 
errors,  since  the  number  of  errors  can  be 
reduced,  but  generally  all  errors  are  never 
totally  eliminated. 

Military  software  depends  on  numerical 
algorithms  to  solve  linear  c-r  non-linear 
integro-differential  equations  ot  dynamics 
necessary  to  steer  maneuverable  systems, 
evade  enemy  anti-missile  systems,  filter  re¬ 
dundant  and  error-contaminated  multiple¬ 
sensor  data,  etc.  These  algorithms  are  only 
approximations  of  physical  laws.  In  many 
cases,  real-time  computational  speed  con¬ 
straints  compel  the  use  of  simplified  al¬ 
gorithms,  which  are  in  effect  approxima¬ 
tions  of  approximations  of  the  physical 
laws.  While  these  algorithms  are  supposed 
to  be  valid  for  a  specified  range  of  opera¬ 
tional  conditions,  they  may  not  be  valid  for 
all  conditions  within  the  range,  thus  a  soft¬ 
ware  error  may  occur.  The  following  ex¬ 
ample  is  from  an  earth  orbital  simulation 
program  which  was  “thoroughly”  verified 
and  performed  flawlessly  until,  years  later. 
North  polar  trajectories  were  specified.  The 
program  produced  trajectories  near  the  North 
Pole  exhibiting  strange  positional  errors 
that  could  not  arise  from  the  slowly  varying 


value  of  gravity  over  the  earth’s  surface.  It 
was  found  that  the  algorithm  used  for  trajec¬ 
tory  calculation  made  use  of  the  tangent  of 
latitude,  which  either  overflowed  the  com¬ 
puter  register  or  was  not  calculated  accurate¬ 
ly  enough  for  values  of  latitude  close  to  90°. 

Software  may  interface  with  other  pro¬ 
grams  or  computers,  receive  data  from  input 
sensors,  service  hundreds  of  users  on  an  in¬ 
terrupt  basis  in  timesharing  systems,  drive 
tapes,  disks,  plotters  or  CRT’s,  and  be  em¬ 
ployed  in  a  multiprocessor  environment, 
with  added  time  phasing  and  logical  com¬ 
plexities,  thus  encounter  many  opportunities 
for  failure. 


6.2.1  Software  Failure  Modes  and  Random 
Occurrences  of  Software  Failures 

Examples  of  software  error  modes  and 
some  of  their  possible  causes  are  given  below: 

a.  One  of  the  logical  paths  is  in  error  for 
certain  operating  modes  or  inputs.  This  may 
be  due  to  a  coding  error,  misunderstanding 
by  the  programmer  of  the  required  logic,  or 
an  error  in  the  program  performance  specifi¬ 
cation. 

b.  The  computations  in  application  pro¬ 
grams  are  wrong  for  certain  functional  argu¬ 
ments  or  input  parameters.  The  cause  may  be 
error  in  coding,  in  units  defined,  in  algorithms 
used  or  in  numerical  techniques.  Sometimes 
the  analytical  expression  which  the  program¬ 
mer  started  with  is  incorrect,  and  the  basic 
mathematics  is  in  error. 

c.  A  program  “hangs  up”  when  directing 
the  activities  of  multiprocessing  equipment. 
The  cause  may  be  found  in  an  error  in  sub¬ 
routine  calls,  programmed  interface,  or  tim¬ 
ing  and  phasing  of  interrupts. 
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d.  A  program  displays  none  of  the  failure 
modes  cited  above,  but  aborts  nevertheless. 
The  cause  may  be  either  internally  computed 
quantities  become  too  numerous  to  be 
accommodated  in  core  memory,  that  the 
phasing  of  overlaid  programs  is  incorrect, 
that  the  supporting  compiler  is  in  error,  or 
simply  that  a  very  large  or  very  small  quan¬ 
tity  (>  10”  or  <  JO-”,  for  instance)  over¬ 
flows  or  exceeds  the  permissible  limits  of 
the  hardware  or  compiler. 

e.  A  program  does  not  hang  or  fail  but  it 
keeps  running  without  doing  anything.  Under 
a  particular  set  of  input  variables  or  com¬ 
mand  instructions,  the  program  may  be 
looping  endlessly  along  a  faulty  logical  path. 

This  brief  list  of  software  error  modes  is 
by  no  means  exhaustive,  but  it  gives  an  idea 
of  the  kinds  of  errors  often  encountered. 
It  also  helps  to  explain  the  basis  of  the 
random  distribution  of  software  failures  in 
time.  Errors  in  operational  programs  are 
latent;  they  manifest  as  failures  only  when 
certain  combinations  of  input  parameters, 
commands,  options,  or  data  exercise  the 
defective  parts  of  the  program.  Under  a  large 
variety  of  circumstances,  these  inputs  may 
be  considered  to  be  random  sets  from  all 
possible  sets  of  inputs.  Random  sets  of  inputs, 
in  turn,  cause  randomly  distributed  failures 
in  the  corresponding  outputs.  These  random 
output  failures  which  can  be  analyzed  statis¬ 
tically,  constitute  the  statistical  basis  for 
the  concept  of  reliability  as  applied  to  soft¬ 
ware. 

There  is  unfortunately  no  concensus  in  the 
literature  as  to  definitions  or  distinctions 
among  commonly  used  terms  such  as  software 
bugs,  errors,  faults,  or  failures.  In  this  manual, 
the  term  error  is  used  to  denote  any  latent 
or  hidden  defect  in  software. 

A  software  failure  is  the  occurrence  or 
revelation  of  a  software  error.  A  failure  can 
be  obvious  as  when  the  computer  stops  op¬ 
erating,  or  more  subtle  as  when  the  results  of 
a  computation  appear  suspiciously  large 
or  small  to  an  analyst  and  are  verified  to  be 
in  error  by  a  percentage  which  may  be  of 
little  significance  in  some  applications,  but 
may  be  critical  in  others  (e.g.,  2%  of  the 
true  value). 

When  an  error  has  been  uncovered  in  a  pro¬ 
gram,  it  is  generally  corrected  before  testing 
resumes.  In  the  process  of  correcting  the 


\ 

program,  other  errors  may  be  found.  Fre¬ 
quently,  a  new  error  is  introduced  during  the 
correction  process.  This  happens  so  frequent¬ 
ly  that  some  statistical  software  models  have 
attempted  to  quantify  this  source  of  pro- 
gramntpig  errors. 

Some  errors  are  not  immediately  corrected 
in  the  high  level  language  used  to  write  the 
program,  but  are  corrected  for  expediency 
at  a  lower  language  level  after  the  high  level 
language  has  been  assembled  or  compiled. 
Such  a  correction  is  termed  a  “patch.”  As 
the  number  of  patches  increases,  the  likeli¬ 
hood  of  introducing  errors  into  a  program 
increases  greatly.  This  has  become  such  a 
problem  that  MU^STD-1679  (NAVY)[9] 
specifies  that  the  total  number  of  patch  words 
in  a  program  shall  not  exceed  0.005  times  the 
total  machine  instruction  words  in  the  pro¬ 
gram. 

The  three  terms.  Criticality,  Severity,  and 
Priority  are  in  common  use  to  describe 
the  impact  of  software  errors  on  a  task 
or  mission.  For  example,  MIL-STD-1679 
(NAVY)[9J ,  employes  Priority  as  the  tech¬ 
nique  for  error  classification.  There  are 
five  levels  of  Priority  in  MIL-STD-1679 
(NAVY)[9J  which  are  defined  as: 

•  Priority  1  is  assigned  to  an  error  that 
prevents  the  accomplishment  of  an  opera¬ 
tional  or  mission-essential  function  ...  or 
which  jeopardizes  personnel  safety. 

•  Priority  2  is  assigned  to  an  error  that 
adversely  affects  the  accomplishment  of  an 
operational  or  mission  essential  function  .  .  . 
and  for  which  no  alternative  work  around 
solution  exists  ...  or  which  interferes  with 
an  operator  ...  so  as  to  degrade  perform¬ 
ance  .  . .  etc. 

•  Priority  3  is  the  same  as  Priority  2, 
except  that  there  is  a  reasonable  work¬ 
around  solution. 

•  Priority  4  is  assigned  to  an  error  that 
is  an  operator  inconvenience  or  annoyance 
and  does  not  affect  a  required  operational 
or  mission  essential  function. 

•  Priority  S  is  assigned  to  all  other  errors. 

6.3  QUANTITATIVE  DEFINITION  OF 
SOFTWARE  RELIABILITY 

Commonly  found  quantitative  defini¬ 
tions^,  5, 6,7]  for  software  reliability  cover 
a  wide  spectrum  of  concepts.  The  most 
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useful  closely  parallel  the  definition  of 
hardware  reliability:  software  reliability  is 
the  probability  that  a  given  software  program 
will  operate  without  failure  for  a  specified 
time  in  a  specified  usage  environment  (i.e., 
using  actual  mission  data). 

6.4  SOFTWARE  RELIABILITY 
PREDICTION 

While  the  literature!  5,6,35-39]  pertaining 
to  software  prediction  is  extensive,  as  yet  no 
accurate  and  generally  applicable  method 
has  been  validated  to  predict  the  reliability 
or  availability  of  software.  There  are  no 
accepted  instruction  error  rates  analogous  to 
the  piece  part  failure  rates  of  MIL-HDBK- 
217140],  from  which  software  reliability  can 
be  predicted.  Attempts  to  derive  such  ele¬ 
mental  rates  on  the  basis  of  selected  attri¬ 
butes  of  a  particular  program,  such  as  its 
“maturity  IeveI’T35]  or  “complexity”(39] , 
have  been  inconclusive  to  date.  In  fact, 
studies  have  shown  that  the  most  complex 
modules  in  a  software  system  frequently 
contain  the  least  errors.  But  this  finding 
could  be  explained  by  the  fact  that  software 
managers  often  assign  the  most  difficult 
modules  to  the  most  experienced  program¬ 
mers. 

6.5  SOFTWARE  RELIABILITY 
MEASUREMENT 

Software  reliability  measurement  can  begin 
as  soon  as  the  software  module  or  program 
completes  initial  debugging,  but  should 
certainly  encompass  formal  software  valida¬ 
tion.  Reliability  is  normally  measured  during 
acceptance  testing  before  the  software  is 
turned  over  to  the  user  to  determine  if  re¬ 
liability  requirements  have  been  met.  This 
value  can  also  be  used  to  determine  the 
effect  on  reliability  of  different  develop¬ 
ment  and  testing  tools  and  techniques.  The 
measurement  also  allows  a  forecast  of  when 
testing  will  be  completed  and  whether  a 
reliability  goal  can  be  met.  The  measurement 
must  take  into  account  differences  from  the 
operational  environment  including  test  data 
selection  and  reliability  growth. 

Software  failure  rate  is  not  a  constant 
parameter,  but  decreases  continuously  as  a 
result  of  progressive  error  detection  and 


removal.  The  failure  rates  of  interest  are  the 
rates  observed  at  various  points  in  the  devel¬ 
opment  program,  and  also  the  rate  forecasted 
to  apply  at  the  beginning  of  system  deploy¬ 
ment  or  service. 

The  very  validity  of  software  reliability 
measurement  methods  in  existence  today 
is  still  strongly  debated,  with  certain 
authors[41]  quite  opposed  to  the  develop¬ 
ment  of  software  reliability  measures  pat¬ 
terned  after  hardware  reliability  measures 
and  with  methods  which  attempt  to  combine 
hardware  and  software  “failures'*  in  assessing 
system  reliability.  Among  others  who  accept 
a  more  conventional  view  of  software  re¬ 
liability,  there  are  fundamental  differences  of 
opinion  about  the  form  of  the  hazard  rate 
h(t)  in  software  reliability.  According  to 
Myers[6],  “Proponents  of  the  constant  h(t) 
agree  that  the  inputs  appear  to  be  random 
because  the  input  domain  is  so  large.  How¬ 
ever,  others  argue  that  h(t)  increases  during 
the  time  between  errors,  using  the  rationale 
that  the  program’s  inputs  gradually  close  in 
on  the  remaining  errors.  There  are  others!  10] 
who  believe  that  h(t)  decreases  with  time, 
arguing  that  the  longer  the  program  runs  with¬ 
out  encountering  an  error,  the  lower  the 
probability  of  encountering  one.  Based  on  the 
earlier  axiom  that  every  time  an  error  is 
encountered,  the  probability  of  encountering 
one  increases,  one  could  postulate  that 
h(t)  decreases  between  errors  and  it  increases 
whenever  an  error  is  detected.” 

The  Duane  model  which  is  consistent  with 
a  decreasing  hazard  rate  seems  to  show 
promise  of  being  applicable  to  many  kinds  of 
software.  It  has,  therefore,  been  selected  for 
description  in  this  section.  In  case  it  proves 
to  give  poor  forecasts  for  a  particular  soft¬ 
ware  project,  then  other  methods  such  as 
Shooman’s!  13,14] ,  Jelinski-Moranda’s  De- 
Eutrophication  |  15,16],  Lloyd-Lipow’s  Modi¬ 
fied  De-Eutrophicationll7,  18],  Jelinski- 
Moranda’s  Geometric  De-Eutrophication!  19] , 
Shick  and  Wolverton's!  22,23] ,  Littlewood 
and  Verralls(10]  (which  reflects  program¬ 
ming  environmental  factors),  may  be  tried. 

The  Duane  Growth  Model!  26]  is  a  non- 
homogeneous  Poisson  process  which  has  been 
used  to  model  the  improvement  of  many 
industrial  activities.  It  has  recently  been 
applied  with  success  to  software  reliabil¬ 
ity!  27).  Its  functional  form  is 
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XE  =  X,  T-“  (6-1) 

where 

X£  ■  observed  cumulative  failure  rate 
(total  failures  *  total  running  time) 

X]  =  estimated  initial  failure  rate  (X  at 
T=l) 

T  =  It  =  observed  total  operating  time 
(hours,  cycles  or  missions) 

a  =  estimated  growth  rate  parameter 

Alternatively,  the  model  can  be  expressed 
in  terms  of  MTBF,  defined  as  X“ 1 . 

The  model  is  fitted  to  the  data  by  comput¬ 
ing  the  value  of  X£  at  each  successive  soft¬ 
ware  failure  and  plotting  the  data  points 
X£  and  T  on  log-log  paper  (or  log  X£  and 
log  T  on  arithmetic  paper).  Data  described 
by  the  Duane  model  will  invariably  show  ac¬ 
ceptable  linearity  as  measured  by  standard 
correlation  indices  (e.g.  r  statistic),  and  will 
become  progressively  more  linear  because 
of  the  smoothing  process  inherent  in  plotting 
successive  averages  of  an  increasing  sample. 
This  accounts  for  the  excellent  visual  fit 
achieved  in  most  Duane  plots. 

In  some  programs  it  has  been  noted  that 
very  early  test  data  (i.e.,  the  first  few  fail¬ 
ures)  do  not  exhibit  satisfactory  linearity  on  a 
Duane  plot.  This  effect,  when  it  occurs,  re¬ 
sults  from  the  limited  ability  of  early  verifi¬ 
cation  testing  to  simulate  operation  of  the 
software  in  a  fully  developed  system.  Thus, 
early  verification  testing  may  be  viewed  as  a 
“benign  use  environment.”  Some  analysts 
omit  this  early  data  from  reliability  growth 
computations;  others  have  successfully  fitted 
the  Duane  model  to  such  data  by  applying  a 
constant  multiplier,  determined  empirically, 
to  test  time  accrued  in  early  tests.  When  this 
is  done,  T/k  replaces  T  in  the  model.  Usually 
k  falls  in  the  range  1  <  k  <  5.  Contractors 
may  adopt  either  approach  as  applicable, 
but  must  fully  justify  the  validity  of  their 
reasoning  in  doing  so. 

The  location  parameter  X,  and  the  slope  a 
of  a  Duane  curve  are  estimated  directly  from 
the  graphic  plot.  Forecasts  are  made  by 
linear  extrapolation  of  the  best-fit  line  to  give 
a  point  estimate  of  X£  for  future  values  to  T. 


This  is  a  valid  descriptive  statistic  under  the 
assumption  that  growth  will  continue  in  the 
future  as  in  the  past.  Typical  values  of  a 
range  from  .2  to  .7  with  the  average  being 
close  to  .4. 

When  software  experiences  reliability 
growth,  the  cumulative  failure  rate,  X£,  is 
a  pessimistic  index,  biased  by  early  unreliable 
performance,  which  is  weighed  equally  with 
more  recent  performance.  The  current  failure 
rate,  X(T),  is  defined  as  the  derivative  of  the 
number  of  failures,  X,  with  respect  to  operat¬ 
ing  time,  T.  Note  that  this  is  an  unconditional 
rate,  not  a  hazard  rate  conditioned  on  survival 
to  T.  It  is  representative  of  future  perform¬ 
ance  allowing  only  for  growth  that  has  al¬ 
ready  taken  place.  In  terms  of  the  growth 
model, 

X(T)  =  =  ^j(X£T) 

=  -^(X,T-«T) 

=  (1-a)  X,T““ 

-  d-or)X£ 

It  can  be  seen  that  the  current  failure  rate 
improves  in  parallel  with  the  cumulative 
failure  rate.  It  is  important  to  remember 
that  this  estimate  is  valid  only  after  several 
failures  have  been  observed;  X(T)  should  not 
be  computed  until  a  pattern  of  reliability 
growth  has  been  established. 

Cumulative  failure  rate,  X£ ,  or  MTBF, 
0£ .  can  be  computed  at  any  time  but  is 
normally  computed  immediately  after  every 
relevant  failure,  in  order  to  provide  proper 
data  points  for  the  growth  model.  The  esti¬ 
mates  are  =  X/T  or  @£  =  T/X,  where  X 
is  total  number  of  relevant  failures  and  T  is 
total  operating  time.  These  estimates  may  be 
made  for  subcategories  of  failures  as  de¬ 
scribed  above;  as  a  minimum,  they  should  be 
made  for  relevant  failures  of  priority  1  and 
priority  2  taken  together  as  a  group. 

When  a  clear  growth  pattern  has  been 
established,  the  current  failure  rate  or  MTBF 
should  be  estimated  at  the  same  time  the 
cumulative  failure  rate  oj^  MTBF  is  calcu¬ 
lated.  The  estimates  are  T  (T)  =  (l-a)*^, 
orS(T)=  l/(l-a)”X£  . 


6-5 


NAVSEA  OD  29304B 


The  parameter  a  is  the  logarithmic  growth 
rate,  the  slope  of  the  Duane  plot.  A  conse¬ 
quence  of  a  growth  process  having  constant 
a  is  that  whenever  the  total  test  time  T 
doubles,  the  cumulative  failure  rate  is  de¬ 
creased  by  a  constant  factor  m  =  2~a .  Growth 
or  learning  models  are  sometimes  referred 
to  by  m  rather  than  a.  Thus,  if  a  =  .3,  m  = 
.812  and  the  model  could  be  termed  an  81 
percent  learning  curve.  Measured  values  of  a 
in  software  programs  usually  fall  between  .2 
and  .7,  corresponding  to  learning  curves  of 
87  and  61  percent.  It  can  be  seen  that  the 
arithmetic  rate  of  improvement  declines 
steadily,  corresponding  to  the  diminishing 
returns  property  exhibited  in  most  growth 
processes. 

The  rate  a  is  computed  as 
lnX,  -  lnXE 

. _ *  *•  //•  *>\ 


and  should  be  reported  as  an  indicator  of  the 
intensity  and  effectiveness  of  project  manage¬ 
ment  relative  to  software  reliability  improve¬ 
ment.  It  should  be  noted  that  a  depends,  at 
least  in  part,  on  the  level  and  consistency  of 
management  stress  on  reliability  improve¬ 
ment.  Therefore,  management  can  improve 
a  by  intensifying  efforts  aimed  at  detecting 
a*»d  correcting  software  failure,  the  primary 
tq'eans  by  which  reliability  growth  occurs. 
(Routine  recalculation  of  a  after  each  failure 
will  quickly  identify  changes  in  this  important 
trend  parameter. 

Other  software  models  of  failure  known 
as  the  seeding  and  tagging  models  have  re¬ 
ceived  a  great  deal  of  attention  in  the  past 
few  years(31,32J.  Inasmuch  as  they  do  not, 
at  present,  provide  time-dependent  reliability 
measures  of  software,  but  only  an  estimate  of 
the  number  of  failures  remaining  in  a  com¬ 
puter  program,  these  models  are  not  recom¬ 
mended  for  reliability  assessment. 

Example  of  Duane  Modeling  of  Software 
Data 

The  data  in  Figure  6-2A  is  derived  from 
weekly  summary  reports  (5  days  testing) 
for  priority  1  and  2  software  failures  during 
the  software  development  program.  Failures 


Time 

Failures 

Cumulative 

Failures 

Failure  Rate 

It  (days) 

X 

lx 

XE 

(Failure/Day) 

5* 

16 

16 

3.2000 

10* 

8 

24 

2.4000 

15* 

6 

30 

2.0000 

20* 

5 

35 

1.7500 

25 

10 

45 

1.8000 

30 

9 

54 

1.8000 

35 

8 

62 

1.7714 

40 

10 

72 

1.8000 

45 

9 

81 

1.8000 

50 

11 

92 

1.8400 

55 

6 

98 

1.7818 

60 

6 

104 

1.7333 

65 

6 

110 

1.6923 

70 

5 

115 

1.6429 

75 

6 

121 

1.6133 

80 

7 

128 

1.6000 

85 

6 

134 

1.5765 

90 

5 

139 

1.5444 

95 

6 

145 

1.5263 

100 

5 

150 

1.5000 

105 

4 

154 

1.4667 

110 

6 

160 

1.4545 

115 

7 

167 

1.4522 

120 

5 

172 

1.4333 

•Benign  Testing  Environment 


Figure  6-2A.  Software  Failures  from  Weekly 
Summary  Reports 

were  fixed  by  competent  personnel  before  the 
next  reporting  period. 

During  the  first  4  weeks  (see  asterisks  in 
Figure  6- 2 A),  Jhe  testing  environment  was 
benign.  When  "X£  =  Ix/It  is  plotted  vs  T 
on  log-log  paper,  figure  6-3  results.  Notice 
the  break  after  t  *  20  days,  and  the  two  trend 
lines  exhibiting  different  slopes. 

A  test  environment  factor,  k,  found  em¬ 
pirically  to  be  equal  to  2  is  now  applied  to  the 
test  time  up  to  20  days,  so  that  the  original 
It  =  10  days  becomes  the  corrected  Itc  = 
5  days,  and  the  original  It  *  20  days  becomes 
the  corrected  Itc  =  10  days.  For  Itc  *  5, 
x  =  16  +  8  *  24  failures  are  accrued,  and  for 
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2tc  =  10,  x  =  6  +  5  =  1 1  failures  and  Ex  = 
35  failures.  Et  =  25  then  becomes  Etc  =  15 
with  x  =  10  failures,  and  the  last  time  Et  = 
120  becomes  Ztc  =  110  with  x  =  5  failures 
(Figure  6-2B).  Corrected  values  Xs  =  Ex/'i  tc 
are  recomputed  and  plotted  on  Figure  6-4. 
The  Duane  plot  is  now  approximately 
straight,  with  a  value  of  estimated  from 
the  intersection  of  a  visually  fitted  straight 
line  and  the  T  =  1  axis.  This  gives  Xt  a  5.95, 
and  from  6-2): 

„  _  In  (5.95) -  In  (172/1 10) 

In  (110) 


Time 

Failures 

Cumulative 

Failures 

Failure  Rate 

It  (days) 

X 

Ex 

Xj; 

(Failure/Day) 

5 

24 

24 

4.8000 

10 

11 

35 

3.5000 

15 

10 

45 

3.0000 

20 

9 

54 

2.7000 

25 

8 

62 

2.4800 

30 

10 

72 

2.4000 

35 

9 

81 

2.3143 

40 

11 

92 

2.3000 

45 

6 

98 

2.1778 

50 

6 

104 

2.0800 

55 

6 

110 

2.0000 

60 

5 

115 

1.9167 

65 

6 

121 

1.8615 

70 

7 

128 

1.8286 

75 

6 

134 

1.7867 

80 

5 

139 

1.7375 

85 

6 

145 

1.7059 

90 

5 

150 

1 .6667 

95 

4 

154 

1.6211 

100 

6 

160 

1.6000 

105 

7 

167 

I.590S 

110 

5 

172 

1.5636 

Figure  6-2B.  Adjusted  Software  Failure  Data 


In  this  example,  it  is  difficult  to  justify 
the  factor  k  ■  2  used  to  correct  the  benign 
testing  environment,  except  if  it  is  arrived  at 
before  a  Duane  plot  is  evolved.  Notice  also 


that  the  data  given  is  not  entirely  suitable 
for  a  Duane  analysis.  The  24  failures,  for 
instance,  do  not  occur  at  time  Etc  =  5,  but 
in  the  interval  0-5  days.  One  may  obtain 
somewhat  more  accurate  results  by  using 
mid  interval  time  markers  at  2.5,  7.5,  12.5 
days,  etc.,  instead  of  5,  10,  15  days,  etc., 
but  in  any  event,  one  loses  information  and 
obtains  reduced  accuracy  when  one  must  use 
summary  results. 

For  a  more  thorough  treatment,  including 
confidence  limits  on  X,  the  expressions  of 
S  5.4.4  can  be  used. 


6.6  COMMENTS  ON  SOFTWARE- 
HARDWARE  RELIABILITY 
ESTIMATION 

Some  care  must  be  exercised  if  one  is  to 
incorporate  software  reliability  in  a  system. 
Assume,  for  instance,  that  the  program 
described  in  the  example  of  5  6.5  is  to  be 
incorporated  in  a  serial  system  of  compo¬ 
nents.  First, _ its  reliability  must  be  com¬ 

puted.  Using "X  =  (l-a)X,T““'  at  T=  110  days, 
we  obtain: 

X  =  (1-0.284)  (5.95)  (110)-°  284 
=  1.1 2  failures/day 

Assume  now  that  a  mission  consists  in  run¬ 
ning  the  program  for  3.863  minutes,  or 
0.002683  days.  Then 

R  =  e^Xi  =  e-(l.  12)  (.002683) 

=  0.997. 

The  system  is  shown  in  figure  6-5.  R  rep¬ 
resents  a  radar  with  mission  reliability  0.996, 
I  represents  a  mechanical/electrical  interface 
with  mission  reliability  =  0.998,  S  represents 
the  software  with  reliability  =  0.997,  C  is 
the  computer  hardware  with  mission  relia¬ 
bility  =  0.999,  and  L  is  a  launcher  with  mis¬ 
sion  reliability  =  0.995.  In  this  simple  case, 
the  syster.  reliability,  assuming  independence 
of  the  components,  would  be  simply:  (0.996) 
(0.998)  (0.997)  (0.999)  (0.995)  =  0.985, 
and  the  unreliability  would  be  0.015. 

Assume  now  that  the  computer-software- 
interfacc  subsystem  is  inexpensive  compared 
to  the  radar  and  launcher,  and  that  it  is 
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Figure  6-3.  Duane  Plot:  Benign  Early  Software  Testing  Environment 
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Figure  6-4.  Duane  Plot:  Operational  Software  Testing  Environment 


Figure  6-5.  Radar  System  with  Software 


decided  to  reduce  the  unreliability  by  adopt¬ 
ing  the  redundant  configuration  shown  in 
figure  6-6. 

It  is  not  difficult  to  calculate  that  the  re¬ 
dundant  system  of  independent  components, 
with  two  interfaces,  two  programs  and  two 
computers  has  an  unreliability  of  0.009, 
which  represents  an  improvement  of  40% 
in  unreliability  over  the  simple  series  con¬ 
figuration.  But  such  a  calculation  would,  in 
this  case,  be  incorrect  for  the  reasons  dis¬ 
cussed  below. 

If  SI  fails,  then  by  definition  S2  would  fail 
since  software  is  identical  and  would  reach 


the  same  point  in  both  programs  and  would 
never  be  able  to  perform  that  function  (get 
past  the  error).  Figure  6-6  however,  ignores 
the  particular  characteristics  of  “redundant” 
software  which  contains  identical  latent  er¬ 
rors.  It  is  then  very  probable  that  SI  and  S2 
will  succeed  or  fail  identically  if  they  receive 
nearly  identical  inputs  II  and  12.  UndeT  these 
conditions,  effectively  figure  6-7,  SI  and  S2 
are  practically  totally  dependent,  and  the  sys¬ 
tem  unreliability  in  this  case  is  0.012  which 
represents  an  improvement  of  only  20%  in 
unreliability  over  the  simple  series  configura¬ 
tion  (Figure  6-S). 
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Section  7 

ASSESSING  SYSTEM  RELIABILITY 
AND  AVAILABILITY 


Methods  for  point  and  interval  assessment 
of  the  reliability  of  individual  components 
have  been  described  in  Section  5.  In  that  con¬ 
text,  the  term  component  was  understood  to 
apply  to  any  assembly  level,  provided  that  the 
test  data  were  taken  at  the  assembly  level  of 
the  component  being  assessed.  In  this  section, 
these  methods  are  extended  to  perform  sys¬ 
tem  level  reliability  and  availability  assess¬ 
ment.  This  assessment  is  also  based  on  the 
data  being  derived  at  the  component  level, 
where  the  components  are  the  constituent 
elements  of  the  system,  but  these  component 
assessments  are  combined  to  provide  esti¬ 
mates  of  system  parameters. 

Point  estimation  of  system  reliability  and 
availability  is  by  far  the  simpler  and  is  dis¬ 
cussed  briefly  first.  System  interval  estimation 
is  more  complex  and  is  discussed  at  greater 
length. 

/.’  POINT  ESTIMATES  FOR  SYSTEMS 

Point  estimates  for  the  reliability  and  avail¬ 
ability  of  systems  can  be  obtained  by  insert¬ 
ing  the  point  estimate  for  each  component 
into  the  reliability  or  availability  model  of 
the  system,  and  solving  the  system  equations. 
Some  examples  are  given  below. 

For  a  serial  system,  the  reliability  of  a  sys¬ 
tem  composed  of  n  independent  components 
is  given  by: 


R,(t)  =  n  Rj(t)  (7-1) 

where  Rj(t)  is  the  reliability  of  the  ith  ele¬ 
ment.  If  estimates  K((t)  of  the  Rj(t)’s  have 
been  obtained  from  test  data,  then  a  point 
estimate  of  R,(t)  is: 

■fc.(t)  =  il  %{t)  (7-2) 

i=  I 


For  a  parallel  system  of  n  independent  com¬ 
ponents,  the  system  model  yields: 


Rs(t)  =  1-  II  ll-Rjtt)]  (7-3) 

If  Rj  (t)  are  estimates  of  R,(t),  then  an  esti¬ 
mate  of  system  reliability  is 

■R,(t)=l-n  ll-Ki(t)]  (7-4) 

i=  I 

For  mixed  series-parallel  systems  of  inde¬ 
pendent  components,  all  series  combinations 
and  all  parallel  combinations  of  elements  are 
reduced  to  single  elements  through  the  equa¬ 
tions  given  above,  to  yield  a  point  estimate 
of  system  reliability.  The  same  procedure  is 
applied  to  the  availability  model  to  obtain  a 
point  estimate  of  system  availability  [§  4.2.2 
discusses  modeling] . 

An  example  illustrating  the  above  tech¬ 
niques,  used  in  the  evaluation  of  missile 
flight  reliability,  is  provided  in  appendix  B. 

7.1.1  The  Rubinstein  Method,  Serial  Systems 
of  Exponential  Components 

The  Rubinstein  method  fully  described  in 
[  1  ]  provides  an  estimate  of  each  component 
failure  rate  X,  and  its  variance  o? .  Because  the 
components  are  independent  and  follow  an 
exponential  failure  law,  the  point  estimate  of 
failure  rate  for  a  serial  system  of  n  com¬ 
ponents  is: 

X ‘  «  I  I  STL  (7-5) 

i=»  j  k  u,‘ 

where  the  subscript  i  indicates  component, 
j  indicates  environment  and  k  indicates  test 
state.  Then: 


7-1 
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t,(  t)  =  er‘'  (7-6) 

Because  the  components  are  independent, 
the  estimate  of  the  variance  ofT,  is: 


4  =  222-^-  (7-7) 

•  i  j 


and  C,,  a  quantity  needed  in  the  sequel  is: 

es  =  4s/xt  (7-8) 

Example,  Point  Estimate  for  Two-Component 
System  -  Rubinstein  Method 


A  two-component  serial  system  is  modeled 
in  figure  7-1.  Each  component  is  tested  sep¬ 
arately.  For  component  1  in  high  temper¬ 
ature,  non-operating  (subscripts  lha),  four 
tests  are  performed  with  the  following  results: 


Test  1.  Component  failed  after  142  min¬ 
utes  of  a  scheduled  300  minute  test  (at  * 
14.2  equivalent  missions). 

Test  2.  Component  did  not  fail  in  300 
minutes  of  testing  (at  =  30). 

Test  3.  Component  failed  after  147  min¬ 
utes  of  a  scheduled  300  minutes  of  testing 
(at  =14.7). 

Test  4.  Component  did  not  fail  in  300 
minutes  of  testing  (at  =  30). 

Then: 

/2N>»«  \ 

\2Nlh.+i; 

where: 

Xlhl  is  the  total  number  of  failures  on 
component  1,  environment  h,  and 
test  condition  a. 

tlh4  is  the  total  test  time  in  equivalent 
missions  on  component  1,  environ¬ 
ment  h,  and  test  condition  a. 

Nlhl  is  the  number  of  units  of  component 
1  tested  in  environment  h  and  test 
condition  a. 


Component 

Environment 

Test 

Condition 

Mission 

Exposure  Time 
(Min.) 

a 

Missions 

Minute 

1 

High  Temp  (h) 

10.00 

0.10 

Operating  (d) 

0.50 

Vibration  (v) 

Non-Oper  (a) 

0.25 

4.00 

Operating  (d) 

20.00 

0.05 

2 

High  Temp  lh) 

Husm 

10.00 

0.10 

Operating  (d) 

1.00 

1.00 

Vibration  (v) 

KHSHHM 

NONE 

— — 

Operating  (d) 

5.00 

WEM 

Figure  7-1.  Serial  Subsystem  Block  Diagram  and  Test  Data 
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IT  (1+0+ 1+0)  2x4 

,h*  =  (14.2+30.0+14.7+30.0)  (2x4)+l 


=  0.0200  failures/mission 

Figure  7-2  summarizes  the  test  data  and 
failure  rate  estimates. 


ik 

Test  Time 
(minutes) 

■ 

ipl| 

Nijk 

X«k 

B 

lha 

889.0 

0.10 

88.9 

4 

2 

.02 

lhd 

4S.4S 

2.00 

90.9 

5 

1 

.01 

Iva 

21.425 

4.00 

85.7 

3 

1 

.01 

lvd 

2000.0 

0.05 

100.0 

3 

0 

.00 

2  ha 

923.0 

0.10 

92.3 

D 

1 

.01 

2hd 

100.0 

1.00 

100.0 

1 

.00 

2vd 

500.0 

0.20 

100.0 

a 

B 

.00 

*M.E.  =  Mission  Equivalents 


Figure  7-2.  Test  Data  and  Failure  Rate  Estimates 

Point  estimates  are  built  up  by  progressive 
summation: 


number  can  be  interpreted  as  a  valid  point 
estimate  of  a  statistical  parameter,  but  rather 
as  an  indication  that  such  an  estimate  cannot 
be  made  until  at  least  one  failure  is  observed. 
For  this  reason,  engineers  sometimes  omit 
reporting  an  estimate  of  unit  reliability  or  re¬ 
duce  it  arbitrarily  to  .9999+.  These  pre¬ 
cautions  are  unnecessary  if  the  users  under¬ 
stand  that  such  an  estimate  is  not  an  asser¬ 
tion  of  certain  success.  It  should  be  noted 
that  confidence  limits  computed  for  the  zero 
failure  case  are  valid. 

At  component  level: 

X,  ="x  lh  +T1V  =  .04  failures/mission 

T2  =  X"2h  +T2V  =  .01  failures/mission 

At  system  level: 

t,  =  T,  +T2  =  .05  failures/mission 
R,=  e"os  =  .9512 

7.1.2  The  Rubinstein  Method,  Parallel 
System  of  Exponential  Components 


*  l  h  *  *  l  hi  +  *  i  hd  ~  -02  +  .0 1 

=  .03  failures/mission 

^1V  ='^iv.  +^IVd  =  01  +  -00 

=  .01  failures/mission 


Figure  7-3  models  a  two-element  active 
parallel  system  consisting  of  two  of  the  com¬ 
ponents  examined  in  the  previous  example. 
The  failure  rate  of  each  component  has  been 
estimated  as  .05  failures/mission. 


*2h  ~  ^lh»  +  *lhd  =  01  +  .00 

=  .01  failures/mission 

x2V  ="Xjvd  =  00  failures/mission 

and  the  corresponding  reliabilities  are  (where 
t  =  one  mission): 

Elh  =e  03  =  9705 

£|V  =e  01  «  .9901 

‘Rjh  =  e  01  =.9901 

E2y  =  e‘  00  =  .9999+ 

Perfect  reliability,  as  reflected  in  the  final 
calculation,  is  acknowledged  to  be  unattain¬ 
able.  It  appears  as  a  consequence  of  the  pre¬ 
ceding  failure  rate  estimate  XJV  =  .00.  Neither 


Figure  7-3.  Two-Component  Parallel  System 
From  equation  (7-4), 

E,  *  1  -  (l-£, )(l-*Ka)  =  e^1  +  e  X*  -  e  4  Xj) 
and 

It,  =  .9976 


7-3 
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7.1.3  Bayesian  Approach  Used  with  the 
Rubinstein  Method 

The  Bayesian  formalism  makes  use  of  prior 
knowledge  about  each  component.  When  the 
prior  failures  are  P  and  prior  test  times  are 
r,  the  ratio  p/r  determines  the  mean  of  the 
prior  failure  rate  X0.  with  o£0  =  pjr2  .  If  the 
gamma  distribution  is  chosen  as  a  prior,  then 
the  posterior  distribution  will  also  be  gamma 
with  posterior  parameters  p  +  X  and  r  +  t. 
Posterior  failure  rate  for  component  i  is: 

=  (0t  +  X4)/(r,  +  t,)  (7-9) 

and  the  variance  oFx/  is  given  by : 

4,  =  (*i  +  Xi)/(ri  +  ti)1  (7-10) 

Predicted  reliability  is  used  as  a  basis  for 
establishing  the  prior  point  estimate.  But 
Bayesian  methods  require  a  statement  of  con¬ 
fidence  in  the  estimate.  This  is  achieved  by 
specifying  the  variance  of  the  prior  estimate. 
The  point  estimate  fixes  the  ratio  p/t.  Sub¬ 
sequent  selection  of  r  defines  the  variance 
P/t2.  If  r  is  large,  the  prior  is  strong  and  rela¬ 
tively  difficult  for  test  data  to  modify;  if 
r  is  small  the  prior  is  weak  and  easily  dis¬ 
counted  by  test  data.  Figure  5-28  gives  empir¬ 
ical  rules  for  choosing  r. 

Bayesian  methods  require  that  predictions 
be  available  to  initiate  the  computations. 
Since  test  data  are  normally  generated  for 
individual  component-environment-test  state 
combinations,  predictions  must  be  made  sep¬ 
arately  for  each  such  combination.  This  is  a 
departure  from  the  usual  practice  of  making 
predictions  for  components,  in  which  the 
effects  of  environment-test  state  combina¬ 
tions  are  tacitly  aggregated.  In  most  cases, 
however,  the  availability  of  published  de¬ 
rating  curves  and  similar  application  factors 
allows  predictions  to  be  carried  down  to  the 
necessary  detail.  When  this  cannot  be  done 
credibly,  the  user  is  faced  with  the  possible 
need  to  discount  some  of  the  test  data  that 
will  subsequently  be  obtained,  usually  at  sub¬ 
stantial  cost  to  the  program.  In  that  instance, 
the  most  conservative  but  least  efficient  pro¬ 
cedure  is  to  employ  in  the  component  calcu¬ 
lations  the  minimum  test  time  accrued  by 
that  component  in  any  environment-test 
condition  combination.  A  better  procedure 


is  to  employ  the  harmonic  mean  of  the 
applicable  test  times.  It  is  given  by: 

'■ =  [=&  4]  <7-") 

ti£  =  number  of  effective  equivalent  mis¬ 
sions  for  ith  component 

tjk  =  number  of  equivalent  missions  in  the 
jth  environment  and  the  kth  test 
condition 

The  value  ti£  is  added  to  ri  and  the  number  of 
actual  failures  xs  is  added  to  Pi  for  each  one 
of  the  components  of  the  system.  Then  the 
system  reliability  is  recomputed  from  the 
system  model  equations. 

Example,  Point  Estimate  of  Four-Component 
Series  System  Reliability,  Bayesian  -  Rubin¬ 
stein  Method 


Assume  a  four-component  series  system, 
with  the  prediction  results  set  forth  below 
(Figure  7-4). 

For  Component  1,  prior  information  con¬ 
sists  of  similar  handbook  information  (Cate¬ 
gory  4  of  figure  5-28),  estimated  failure  rate 
is  0.0004  failures  per  mission,  and  tests  plan¬ 
ned  for  the  component  will  total  200  equiv¬ 
alent  missions.  [Note  the  symbol  X0  indi¬ 
cates  a  prediction.] 

*0j  -  0.0004  failures  per  mission 
r,  =  (1/4)  (200)  =  50  prior  missions 
♦  ,  =X0i  r,  =(0.0004)  (50) 

=  .02  prior  failures 

Component  2  has  information  of  a  fre¬ 
quency  nature  that  best  fits  Category  7.  Its 
prior  values  become: 

X0j  =  .0067  failures  per  mission 
r  j  =  (0.3)  (600)  =  1 80  prior  missions 

p3  -  (.0067)  (180)  =  1.20  prior  failures 

Component  3  also  has  frequency  data  avail¬ 
able  from  a  previous  program  where  both  the 
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Component 

Prior 

Failure  Rate 

Category 

Selected 

* 

Planned  Test  Time 
in  Equivalent 
Missions 

Prior  Freq. 
Data 

Prior 

Parameters 

4> 

T 

1 

0.00040 

4 

200 

N/A 

0.02 

50 

2 

0.00670 

7 

N/A 

4/600 

1.20 

180 

3 

0.00600 

8 

N/A 

6/1000 

6.00 

1000 

4 

0.00011 

3 

720 

N/A 

0.01 

90 

0.01321 

*See  Figure  5-28 


Figure  7-4.  Bayesian  Prediction  Results 


component  and  the  mission  are  judged  iden¬ 
tical  to  the  new  program  (Category  8). 

XCj  *  .0060  failures  per  mission 

r 3  =  ( 1 .0)  ( I 000)  =  1 000  prior  missions 

=  (.006)  ( 1 000)  =  6  prior  failures 

Information  on  Component  4  consists  of 
generic  data  only  (Category  3).  Analysis  of 
tests  planned  for  component  4  indicates  that 
720  equivalent  missions  of  testing  will  be  con¬ 
ducted  with  a  predicted  failure  rate  of 
0.00011.  Then: 

X0<  =  .0001 1  failures  per  mission 
r4  =  ( 1  /8)  (720)  =  90  prior  missions 
♦a  =X04  r4  =  (.00011)  (90) 

=  .0099  —  .01  prior  failures 

The  component  failure  rates  are  added  to 
give  a  prior  estimate  of  system  failure  rate  and 
system  reliability  is  estimated.  These  values 
are: 

X0g  =  0.0132  failures/mission 
R0s  x  e‘  0,M  *  -9869 

Assume  that  three  successive  reliability 
reports  are  prepared  during  the  program,  each 
embodying  calculations  of  reliability.  Cumu¬ 
lative  test  and  failure  data  at  each  report  are 
tabulated  below.  Note  that  the  rate  of  testing 
is  not  uniform  throughout  the  program. 


To  recompute  the  system  estimated  X, 
and  Rt  after  the  1st  report,  for  instance,  one 
starts  with  the  prediction  model, 


.  _  0.02  1.20  6.00  0.01 

50  180  1000  90 

=  0.0132  failures/mission 

which  is  updated  with  the  Report  No.  1  data 
from  figure  7-5  as  follows: 

^  _  0.02+0  .  1.20+1  .  6.00+0 
s(Report  i)  50+20  180+10  1000+15 

0.01+0 

90+20 

=  0.01 787  failures/mission 
^Report,,  -  e^s(,>  =  e-o  om  =  0.9823 


Report  No. 

1 

2 

Component 

X 

t 

X 

t 

X 

1 

0 

20 

0 

50 

0 

2 

1 

10 

1 

20 

1 

3 

0 

15 

1 

50 

3 

4 

0 

20 

0 

100 

0 

Figure  7-S.  Test  and  Failure  Data  for  Bayesian 
Analysis 
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frediction 

Report  1 

Report  2 

Report  3 

n 

A 

X 

A 

R 

$ 

A 

R 

X 

A 

R 

Component  1 

0.00040 

.9996 

0.00029 

.9997 

0.00020 

.9998 

0.00008 

.9999 

Component  2 

0.00670 

.9933 

0.01158 

.9885 

0.01100 

.9891 

0.00786 

.9922 

Component  3 

0.00600 

.9940 

0.00591 

.9941 

0.00667 

.9934 

0.00600 

.9940 

Component  4 

0.00011 

.9999 

0.00009 

.9999 

0.00005 

.9999 

0.00002 

.9999 

System 

0.01321 

.9869 

0.01787 

.9823 

0.01792 

.9822 

0.01396 

.9861 

Figure  7-6.  Prediction  and  Posterior  Estimates  of 
Failure  Rate  and  Reliability 


Proceeding  in  the  same  manner  for  the 
other  reports,  the  point  estimated  values  of 
failure  rate  for  these  reports  are  shown  in 
figure  7-6. 

7.1.4  Point  Estimate  of  Steady-State 
Availability  for  Series  and  Parallel 
Systems 

A  point  estimate  of  steady-state  avail¬ 
ability  for  a  serial  system  of  n  independent 
exponential  components  is  the  product  of 
the  estimate  of  availability  for  each  com¬ 
ponent,  or 


components  or  stages  to  form  the  system,  but 
is  itself  composed  of  components  or  serial 
strings  of  components  arranged  in  m-of-n 
active  parallel  redundancy.  For  example,  fig¬ 
ure  7-7  depicts  a  system  consisting  of  two 
serial  stages.  The  serial  structure  that  re¬ 
mains  after  estimates  have  been  obtained 
for  all  stages,  permits  system  availability  to 
be  estimated  as  the  product  of  the  stage  avail¬ 
abilities,  or:  Xs  *  Xs ,  •  XSJ . 

7.1.5  Reliability  and  Availability  Assessment 
of  Redundant  Non-Repairable  and 
Repairable  Systems 


A 


S 


n 

n 

i=  i  X, 


(7-12) 


where  u,  and  X,  are  respectively  the  repair 
rate  and  failure  rate  point  estimates  for  the 
ith  component. 

For  a  parallel  system  of  n  identical  com¬ 
ponents,  m  of  which  must  be  operable  for  the 
system  to  be  available,  the  point  estimate  is: 


A.  = 


IM 

«T+^)n 


(7-13) 


In  series-parallel  systems  of  components, 
this  equation  can  be  used  to  obtain  a  point 
estimate  of  availability  for  any  serial  stage. 
A  serial  stage  is  a  group  of  components 
which  may  combine  in  series  with  other 


The  reliability  and  availability  assessment 
of  many  systems  more  complex  than  series 
and  parallel  can  be  obtained  by  considering 
Birth  and  Death  processes  and  solving  the 
differential  equations  which  arise  from  a 
state  transition  matrix  formulation  of  these 
processes.  A  few  cases  are  solved  in  Appendix 
D,  §  D.3,  with  the  results  indicated  in 
§  7. 1.5.1,  §  7. 1.5.2  and  §  7.I.5.3. 

7.1 .5.1  MTBF  of  a  4  of  6  Repairable  System 
with  Restricted  Repair 

Out  of  6  identical  components  in  parallel, 
4  must  be  operating  for  the  system  to  be 
operational.  For  such  a  system  with  a  single 
repairman  available  (restricted  repair): 

„tbf  . 
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STAGE  1  (SI ) 


STAGE  2(S2> 


Figure  7-7.  System  of  Two  Serial  Stages,  Each  with 
Identical  Components 


Thus  an  estimated  MTBF  is: 

I  (iflTT2  -H  2  JT  X>30(K)” 
MTBF  =  ~  — - 

^  |_  120(£)J  _ 

7. 1.5.2  Reliability  of  a  1  of  3  Standby 

System  with  Dormant  Hazard  Rate 
and  No  Repair 


Assuming  perfect  switching,  X  =  operating 
failure  rate,  XD  =  dormant  failure  rate,  then: 

R(t)  =  _  e  <**3*n>'  j 


Again  an  estimate  of  R(t),  jl(t),  js  obtained 
by  replacing  X  and  XD  by  T  and  ,  respec¬ 
tively,  in  the  expression  for  R(t). 

Appendix  A  provides  an  example  of  an 
availability  analysis  of  a  Fire  Control  Sub¬ 
system. 

7.1 .5.3  MTBF  and  MTTR  of  M  of  N  Identical 
Repairable  Components  in  Parallel 
with  Restricted  Repair  -  The 
Einhorn  Approximations 

Only  one  component  is  under  repair  at 
any  one  time.  Ifjr  is  large  compared  to  X: 

(„Nt)  « 

(7-14) 

and 

MTTR  =  (—}  /(N-M+l)  (7-15) 


Then,  point  estimates  of  MTBF  and  MTTR 
are: 
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and  /  i  \ 

MTTR  =  (^')/(N'M+1) 

A  vast  source  of  information  on  Birth  and 
Death  processes  appbed  to  reliability  and 
availability  system  assessment  is  available  [2]. 
Tables  for  m  and  n  configurations  are  provid¬ 
ed  in  appendix  E,  §  E.4,  figure  E-i  1 . 


software-hardware  system  assessment  in 

§  6.6. 

7.2  INTERVAL  ESTIMATES  FOR 
SYSTEMS 

7.2.1  Series,  Parallel  and  Series-Parallel 

Systems  of  Exponential  Components 
-  The  Rubinstein  Method 


7. 1 .6  Reliability  Point  Estimates  for 
Complex  Systems 


It  has  been  indicated  that  for  a  serial  sys¬ 
tem  the  Rubinstein  method  provides  a  point 
estimate  of  system  failure  rate  by  solving: 


For  logically  complex  systems  as  illustrated 
in  §  4.2. 2. 1.2. 2  through  §  4.2. 2. 1.2.4,  bi¬ 
nomial  modeling,  conditional  modeling  or 
minimal  cut  modeling  allows  one  to  express 
system  reliability  in  terms  of  the  individual 
component  reliabilities.  To  obtain  an  estimate 
of  system  reliability,  it  is  only  necessary  to 
replace  known  component  reliabilities  by 
their  estimates  in  the  expression  for  system 
reliability. 

In  §  4. 2. 2. 1.2.2,  for^instance,  the  esti¬ 
mated  expression  for  Rs,  Rs,  becomes: 


■t"  _  EE2  Xjjk 

•  i  j  k 


(7-5) 


Also,  the  estimate  of  oX  ,  the  variance  ofT, 
was  shown  to  be  * 


'I  _  ESS  xijk 

\  » i  k  tijk 


(7-7) 


A  good  approximation  for  the  upper  limit 
at  confidence  y  for  failure  is  given  by: 


Rs  —  ( 1  -Rc )  Ed  Eg  +  Ec  ( 1  -Sp ) 
+  Rc‘Rd(1-^e)  +  Rc^d^e 


7.1.7  Reliability  Point  Estimates  for 
Software-Hardware  Systems 

Only  when  different  versions  of  software 
are  operated  together  is  software  truly  re¬ 
dundant.  Replicated  software  and  the  fre¬ 
quently  used  “casualty  programs”,  which 
are  simply  subsets  of  the  primary  system  soft¬ 
ware,  are  not  redundant  from  a  reliability 
viewpoint.  Non-redundant  software  is  gen¬ 
erally  represented  as  a  single  block  in  the 
main  sequence  of  a  system  reliability  block 
diagram. 

When  software  is  incorporated  in  a  system 
model,  the  procedure  to  obtain  point  esti¬ 
mates  of  system  reliability  proceeds  strictly 
as  it  would  if  only  hardware  were  involved. 
That  is,  the  reliability  of  each  block  of 
the  diagram  is  estimated  from  test  data, 
and  the  system  equation  is  solved . 

See  an  example  of  software  reliability 
assessment  in  §  6.5  and  of  a  “redundant” 


Au  =  max< 


where 


2x+</3K)2C+  y4%K)2  C-»QHC)4fi 


my 


(7-16) 


smallest  tyk  (with  no  failures) 


Cs  =  ?X  Ms  (7*8) 

In  this  expression,  K  is  the  standard  normal 
deviate  for  specified  confidence  level  (e.g.. 
K  =  0.842  for  80%  confidence),  and  0  is  a 
bias  correction  factor  tabulated  in  figure  5-25 
for  a  confidence  y  =  0.80.  The  upper  limit  on 
failure  rate  in  the  no-failure  case  is: 


x  =  mi 

u  smallest  tjjk 


(7-17) 


A  lower  confidence  limit  is  of  principal 
interest  when  reliability  is  estimated.  It  is 
obtained  by  substituting  the  corresponding 
upper  limit  for  failure  rate  into  the  reliability 
equation; 

Rl  =  e  *u  Again,  t  =  one  mission.  (7-18) 
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With  increasing  test  time  the  lower  reli¬ 
ability  confidence  limit  RL  approaches  the 
best  estimate,  R,  asymptotically.  But  R  may 
itself  increase  when  corrective  actions  are 
effective  in  eliminating  failure  modes.  If  oper¬ 
ating  experience  subsequent  to  corrective 
action  gives  convincing  evidence  that  a 
failure  mode  has  been  eliminated,  past  fail¬ 
ures  in  that  mode  may  be  deleted  from  the 
data  base  used  to  compute  reliability.  Under 
these  conditions  successive  estimates  of  tl 
made  during  a  development  program  will 
form  a  growth  curve.  Such  a  curve  is  of  value 
for  visualizing  program  progress,  and  with 
care,  it  can  also  be  extrapolated  to  predict 
future  reliability  growth. 


.  2(.Q3)  *  [(1. 272)(0.842)|  *  (.01 »)  ♦  >7-00167338 

-  .0568  failures/mistion 
RLIh-e  u,6  «  J-.0S6S  -  9448 

Similarly  the  calculation  for  X„ 

u  lv 

^  ^lvk/'lvk 

T  _k  _  .01/85.7 +  0/100 

i»  -rr  ft] -  0.0117 

Alv 

Using  equation  7-16: 


Example,  Interval  Estimation  for  Series  Sys¬ 
tem  Reliability,  Exponential  Components  - 
Rubinstein  Method 


The  example  of  §  7.1.1  used  the  test  data 
from  figure  7-2  to  calculate  point  estimates 
of  failure  rate  and  reliability. 

The  example  is  continued  to  obtain  the 
upper  bound  on  failure  rate  and  the  lower 
bound  on  reliability  at  the  80%  confidence 
level  for  each  of  the  two  components  and  the 
system. 

Equation  7-16  is  used  to  calculate  the 
upper  bound  on  failure  rate  [Equation  7-17 
is  the  no-fa  lure  easel .  ^ 

^  In  order  to  use  equation  7-16,  X,  K,  0  and 
C  are  required.  The  T  values  were  obtained 
in  §  7.1.1,  K,  the  Normal  Deviate,  is  0.842 
since  we  selected  the  80%  confidence  level, 
0  may  be  calculated  using  equation  £59 
gr  read  directly  from  figure  5-25,  and  C  of 
q£/X"  (Equation  7-8). 


r(^lhk/f|hk  ) 


Mh 


.02  .01 
88,9  90.9 
.03 


.011 


/ :i  on  *  hi  36«4)  io  84?>i oi  m  y-genssr 


*lv 


=  max.  < 


[(1.5074)  (0.842)1 2 
100 


Xu  =  max. 

u  lv 


0.0325 

0.0161  in  failures/mission 


Therefore: 


Xu  =  0.0325  failures/mission 

IV 

R,  =0.9680 

L  IV 

By  similar  methods: 

X..  =  0  03 12  failures/mission 

2h 

R,  =  .9693 

L  2h 

And  for  the  zero-failure  cases,  using  equation 

7-17: 

xu  =Mli£_L 

**  tl  ^2vK 


0  -  1.2725  for  3  railures  and  80%  con¬ 
fidence  (Figure  5-25).  Then,  using  equations 
7-16  and  7-18: 

^ih  ♦  .  Ah 

x«,h - j - + - 5 - 


_  [(1.507)  (0.842)]*  /  1  \ 

1  V100/ 

=  .0161  failures/mission 

R  =  e- oi*i  =  .9840 
l2v 
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Figure  7-8  summarizes  the  above  results  at 
component-environment  level. 


(  2(.0IH(I36»4)(843)|I  (  010834) 


Beet  Eel. 

80%  Conf.  ! 

Component 

Env. 

V 

Xu 

RL 

1 

High  Temp. 

.0300 

.0)00 

.9705 

.9901 

.0568 

.0325 

.9448 

.9680 

2 

High  Temp. 
Vib. 

.0100 

.9901 

.0312 

.9693 

0000 

.9999 

.0161 

.9840 

max.  | 


[(1.5074)  (.842)1 2 
R50 


Xu  =  max. 
“2 


0.0312 

0.0161  in  failures/mission 


Figure  7-8.  Component  Environment  Data 


Therefore: 


To  obtain  component  estimates,  using  figures  x  -  0.03 1 2  failures/mission 

7-2  and  7-8:  “2 

^  ^  _  RL  =  0.9693 

X,  =  X)h  +  X,v  =  .03  +  .01  =  .04  failures/mission  2 

T2  =T2h  +TJV  =  .01  +  .00  =  .01  failures/mission  Subsystem  estimates  are  calculated: 


And  from  equation  7-6,  with  t  =  1  mission: 
=  e  <*  =  .9608 
E2  =  e  o'  =  .9901 

Using  equation  7-1 6,  with  C,  and  C2 : 


C, 


.02  ,  .01  ,  .01  ,  .00 
88.9  +  90.9  +  85.7  ^  100.0 
.04 


=  .011292 


.01  ,  00  ,  00 

t2  =  92.3+ 100.0  ..,100.0  .  01Q834 


(  :(.04»-;:  2471  K0  842)]>  (  01  1292)*%/  002147 


Xu  j  =  max.  | 
XU|  =  max.  | 


[(1.5074)  (.842)] 2 

TOO 

0.0694 

0.0161  in  failures/mission 


Therefore: 

XU(  =  0.0694  failures/mission 
R,  =  0.9330 

*•  i 


Ts  =  X,  +  X2  =  .04  +  .01  =  .05  failures/mission 
Es  =  e“  05  =  .9512 


^  02  .  01  .  01  ,  0  ,  01  ,  0  ,  0 

Cs  =  883*90.9*85  7  ^100*92.3  100  100  0U2002 

'o^  =  TSCS  =  (0.05)  (.0112002)  =  0.00056 

Xu  s  =  .0812  failures/mission 

R,  =  c-0812  =  .9220 
*-s 


In  summary, 


Mission 

Component 

Best  Est. 

80%  Conf. 

£ 

K 

Rl 

1 

.0400 

.9608 

.0694 

.9330 

2 

.0100 

.9901 

.0312 

.9693 

Subsystem 

.0500 

.9512 

.0812 

.9220 

Similarly : 


completing  the  example. 
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Example,  Parallel  Subsystems  with  Exponen¬ 
tial  Components  -  Rubinstein  Method 

Returning  to  the  two-subsystem  parallel 
system  of  §7.1.2  which  consists  of  two  of 
the  subsystems  described  above,  the  follow¬ 
ing  point  estimate  was  obtained: 

Rp  =  1  -  (1-Rj)2  =  1  -  (1  -  0.95 12)2 
=  .9976 

A  conservative  approximation  of  system 
reliability  in  the  two-identical-element  parallel 
case,  which  is  reasonably  close  for  low  failure 
rates,  is 

Rp  =  e~xP  -  e~^*)2  (7-19) 


These  approximations  permit  subsystem 
reliability  to  be  estimated.  In  the  example, 
quantities  previously  found  are: 

T,  =  0.05  failures/mission 

=  0.00056 

From  equation  7-20: 

xj  =  (.05)2  -.00056  =  .00194 

From  equation  7-19: 

”Xp  —  =  .00194  failures/mission 

A  relatively  bias  free  estimate  of  Rp  (from 
equation  7-19): 


where  Tp  as  (X,)2 .  In  the  above  example,  for 
instance, 

^  ~ e-(  °s)2  =  -9975 

In  the  examples  given  above,  the  implicit 
use  of  R;2  or  the  explicit  use  of"£p  as  biased 
estimators  [  1 1  ^do  not  seriously  affect  the 
final  value  of  R.  If,  however,  one  were  to 
need  many  such  estimates  to  estimate  the 
system  reliability  of  a  system  composed  of 
series-parallel  subsystems,  then  a  unidirec¬ 
tional  biased  estimate  for  each  of  the  sub¬ 
systems  could  accumulate  to  a  large  bias  for 
the  system.  It  is  therefore  preferable,  if  the 
illustrated  parallel  subsystem  is  only  a  portion 
of  the  total  system,  to  calculate  parallel 
reliability  and  parallel  reliability  bounds  as 
follows: 

Reliability  Lower  Bound.  Parallel  Subsystems 
with  Exponential  Components  -  Rubinstein 
Method 


Rp  =  e~AP  =  e-  00194  =  .9981 

To  compute  the  lower  bound  on  failure  rate 
an  estimate  of  C0  is  required.  From  equation 
7-8: 


and  using  equation  7-2 1 : 

ep  ~4tf-s)2c^  / tp 

=*  4(.05)2  ( .0005 6)/.00 194 
=*  .002887 
Then  from  equation  7-16: 

2Xp  ♦  +  V^p(/?K)2Cp  ♦  Q3K)4  tp2 


A  relatively  bias-free  estimate  of  X2  is 
given  [1,  p.  2-141  by: 

x)  =  ft,)2  -  Q  (7-20) 

This  estimate  has  approximately  the  variance: 

^  ^  4<T,)J  at  (7-21) 


(/?K)2Cp  =  {(1.2280X  842))  2(.002887) 

=  0.0030865 

Solving: 

,  _  2(.00194)+0.0030865+n/:OOOOI3^8 

% 

X  =  0.0064  failures/mission 

p 
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Therefore: 

R  =  g- 0.0064  =  9936 

p 

The  approximation  can  be  extended  directly 
to  an  n-element  parallel  configuration  p, 


Tp  =  n  % 

or,  for  n  identical  \i% 

xp  S  x" 

Bias  correction  equations  for  parallel  con¬ 
figurations  of  three  or  more  elements  are 
cumbersome  and  of  little  overall  effect  in 
system  calculations  where  many  other  ele¬ 
ments  are  in  series.  Therefore,  the  estimates 
(X)3,  (T)4,  etc.  may  be  used  directly  in  most 
such  applications. 

Series-Parallel  System.  Interval  Estimation 
Exponential  Components  -  Rubinstein 
Method 

Consider  the  following  series-parallel  sys¬ 
tem  composed  of  components  1  and  2  in  the 
example  of  7.1.1. 


This  system’s  structure  reduces  to 


SCRIES 

GROUP 

GROUP 

Best  estimates  are: 

T,tp  =ts  +Tp  =  .05 +  .00194 
=  .05 194  failures/mission 


i+p 


=  e 


-.051  94  _ 


=  .9494 


where  the  subscripts  indicate  the  series  and 
parallel  groups,  respectively. 


As  previously  estimated,  the  variance  of 
the  series  portion  is: 


•a|>=  .00056 


To  this  must  be  added  the  variance  of  the  par¬ 
allel  portion,  estimated  as  (Equation  7-21 ): 
t 

*o|>  =  4(t,)2  =  4(.05)2  (.00056) 

=  .0000056 


giving, 

•o|>  =62>  +  g3>  =  .00056  +  .0000056 

As+p  As  Ap 


=  .000566 


Then, 


C5+p  /tI+p  =  .000566/05194 


•'j+p 

=  .01090 


and, 

K 


‘«+p 


_  :i osi94i« in. noiom:)!1 1 ouwon^/opTrraiB 


Xu>+p  =  .08 1 4  failures/mission 


R  =  g-0814  =  _92 1 8 

*+p 


7.2.2  Interval  Estimation  for  Systems  of 
Exponential  Components  Using 
Bayesian  Rubinstein  Method 

For  systems,  Bayesian  estimates  combine 
with  test  data  in  a  manne:  precisely  analog¬ 
ous  to  the  Rubinstein  method;  that  is,  the 
posterior  estimates  ofT,  =  (0+x)/(T+t)  and  of 
=  (4+x)/(r+t)2  «\re  used  in  the  equations: 


_  ♦  (flC)2  C,  +  y/tXj  (I3K)%  +  (flK)4?,2 

2 
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Since  non-integer  numbers  of  failures  can 
appear  in  the  Bayesian  formulation,  it  may  be 
necessary  to  compute  0  by  linear  interpola¬ 
tion  between  values  given  in  figure  5-25. 
Note  that  the  zero  failure  case  is  not  usually 
applicable  since  in  the  Bayesian  case  4,  the 
pseudo  failures,  is  never  zero. 


Example,  Interval  Estimation,  Exponential 
Components,  Bayesian  Rubinstein  Method 


r  =  e  “o.  =  0.9774 
Final  Estimate  (after  report  3  —  figure  7-5): 


A  Xi 
_  1=  I  tl*Ti 
\yatem 


From  figures  7-4,  7-5  and  7-6: 


1 

) 

t 


The  Bayesian  upper  bound  on  failure  rate 
is  calculated  in  essentially  the  same  manner 
as  just  illustrated  for  the  Rubinstein  method. 
For  example,  using  the  information  con¬ 
tained  in  figures  7-4,  7-5,  and  7-6  and  §7.1.3 
the  sample  calculation  at  the  system  level  is 
illustrated  for  the  prediction  and  final  (re¬ 
port  3)  calculation. 


Prediction 


'Prior 


i »  i  r 

X„ 

°  ay  item 


.00008  .00786  .00600  .00002 

.  200+50  100+180  500+1000  400*90 
.01396 


C,  =  0.00232 


X  =  2X.+(0K)JC,  +v/4£.(/3K)jC.  +(0K)«a 
u«  2 

0  =  1.1 660  by  interpolation  in  figure  5-25  at  11.23  failures 


X  =  2(.O1396Hl(1.166K.842))M.OO232)*%/r00gnWT 

“a  2 


from  figure  7-4  we  have: 


.00040  .00670  ,00600  ,  .0001 1 
50  180  1000  90 

0.01321 


XU|  =  0.02078  failures/mission 

r  =  g—  .02078  =  9794 

L  a 


Cprior  -  0.00397 


and  K  is  0.842  for  80%  confidence,  0  (1.1983) 
is  found  by  interpolation  in  figure  5-25  at 
7.23  failures,  (24). 

^  _  2(.01 321  X+l  I  ■  I983)(. 842))  »(.00397)V.OOQ22W 

u  oa  2 

The  predicted  upper  bound  on  system  failure 
rate  is: 

X  =  0.02281  failures/mission 

OS 

The  corresponding  lower  bound  reliability 
prediction  is: 


Failure 

Rate 

\ 

Reli¬ 

ability 

Failure 

Rate 

Upper 

Bound 

\ 

Reli¬ 

ability 

Lower 

Bound 

\ 

Prediction 

.01321 

.9(69 

.022(1 

.9774 

Report  1 

.01717 

.9(23 

.02(59 

•971( 

Report  2 

.01792 

.9(22 

.02779 

.9726 

Report  3 

.01396 

.9(61 

•0207( 

.9794 

Figure  7-9.  System  Results  (Estimate  and  80% 
Confidence  Bound) 

Figure  7-10  shows  a  comparison  of  the 
Rubinstein  and  Bayesian  Rubinstein  methods 
for  the  example  given  above.  It  can  be  seen 
that  for  both  the  best  estimate  and  upper 
confidence  limit,  the  earlier  estimates  of 
reliability  using  the  Bayesian  formulation  are 
higher  than  by  the  Rubinstein  approach 
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alone;  as  test  data  increase,  the  measured 
reliability  using  the  Rubinstein  method 
increases. 


A  PREDICTION 
O  BEST  EST 
X  80% CONE. 

— —  BA  YES- RUBINSTEIN 
- RUBINSTEIN 

Figure  7-10.  Comparison  of  Rubinstein  and  Bayesian 
Rubinstein  Techniques  of  Reliability 
Measurement 

7.2.3  Serial,  Parallel,  and  Complex  Systems 
of  Exponential  and  Binomial 
Components  -  Approximation 
Methods 

Ideally,  one  would  like  to  avail  oneself 
of  a  method  which  can  handle  a)  serial 
systems,  b)  parallel  systems,  c)  serial-parallel 


systems,  d)  complex  systems,  e)  binomial 
components,  0  exponential  components,  g) 
Weibull  and  other  distributional  components, 
h)  multiple  environments,  i)  multiple  test 
states,  j)  mixed  truncation  testing,  and  which 
are  k)  tractable,  and  1)  valid  over  all  ranges 
of  component  reliabilities.  Unfortunately, 
such  a  method  does  not  exist.  The  best  which 
can  be  hoped  for  is  that  a  method  will  include 
as  many  as  possible  of  the  attributes  a) 
through  1). 

The  Rubinstein  method  presented  in 
§7.2.1  and  derived  in  [1]  includes  all  at¬ 
tributes  except  d)  and  is  untested  for  g). 
The  Approximately  Optimum  (AO)  method 
of  Mann  and  Fertig  [3,  p.  517-524],  [4]  and 
Mann  and  Grubbs  (5),  makes  use  of  their 
discovery  that  -In  R,,  where  R,  is  a  series 
system  reliability,  can  be  well  approximated 
by  a  non-central  chi-square  distribution. 
The  AO  method  does  not  include  attributes 
d),  g),  h),  i)  and  j),  but  is  applicable  to  mixed 
binomial-exponential  components.  These  two 
methods  seem  to  be  the  very  best  among 
approximation  methods. 

If  only  very  reliable  systems  are  considered, 
and  thus  if  attribute  ( 1 )  is  not  a  requirement, 
and  if  prior  engineering  knowledge  is  avail¬ 
able,  then  it  can  be  used  in  a  non-Bayesian 
sense  as  a  weighing  factor  in  the  method 
presented  by  Myhre,  Rosenfeld  and  Saunders 
[6].  The  authors  have  shown  that  their  re¬ 
sults  are  insensitive  to  fairly  significant 
changes  in  the  weighing  factors  and  claim 
applicability  to  attributes  a),  b),  c),  d),  e), 
and  0  [type  I  censoring  only] . 


7.2.4  Serial  System  of  Exponential 

Components  -  The  Fagan-Wilson 
Simulation  Procedure 

The  procedure  [  7] ,  which  is  Monte-Carlo, 
assumes  that  each  component  i  follows  an 
exponential  failure  model  R((t)  ■  e~x'\ 
and  that  n  independent  components  form 
a  system  with  reliability  model, 

Rs(t)  =  f[(R,  (t),  Rj(t), . .  .  R,(t), .  . .  R„(t)] 

Tests  are  assumed  to  be  terminated  either 
at  a  fixed  time  t0  (type  I  censoring),  or 
after  a  particular  failure  occurs  (type  11 
censoring).  The  estimator  T,  is  selected  to 
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be  xs/t0  in  type  I  censoring.  The  distribution 
of  X,  is  known  to  be  xJ,x  42/2t0. 

The  procedure  consists  of  generating  by 
computer  N  chi-square  samples  for  each 
component,  which  transform  into  as  many 
simulated  sets  of  tt,(t)  =  e“"^‘<‘),  i  =  1,2, 
3,  .  .  .  n.  The  R,(t)  are  stored  in  memory, 
and  N  simulated  values  of  H,(t)  are  gener¬ 
ated  through  the  closed  form  system  relia¬ 
bility  model  171 .  These  N  values  of  H,(t)  are 
ordered  and  80%  system  reliability  confidence 
bounds  are  obtained  from  the  20  percentile 
of  the  resulting  histogram. 

7.2.S  Interval  Estimation  for  Systems  of 

Components  with  Non-Exponential  or 
Different  PDF’s  -  Monte-Carlo 
Simulation 

Even  for  simple  systems  of  two  serial  or 
two  parallel  components,  closed  form  or 
approximation  formulas  are  cumbersome  or 
lacking  when  the  components  PDF’s  are 
non-exponential  or  different.  Monte-Carlo 
estimates  are  then  used  to  obtain  a  lower 
reliability  bound,  an  upper  bound  on  failure 
rate,  and  even  a  MTBF,  even  though  it  is  only 
a  point  estimate.  (Point  estimates  of  relia¬ 
bility,  however,  can  still  be  obtained  readily 
by  the  methods  of  Section  7.1 .) 

The  Monte-Carlo  procedure  which  may 
take  many  different  forms  requires  a  great 
deal  of  sophistication  [31 ,  f 7 J ,  (81.  For 
the  simple  two-component  systems  illus¬ 
trated  in  figure  7-11,  the  procedure  selected 
is  as  follows: 

Step  1  -  Draw  a  vector  of  random  param¬ 
eters  from  the  joint  PDF  of  PDF,  and  PDF, , 
the  PDF’s  of  component  1  and  component  2, 
respectively. 

Step  2  -  Find  numerically,  with  the  vector 
drawn  at  step  1  as  parameters: 

MTBF(S,rieI)*E(PDFt(t)  /;  PDF,  (r)dr 
+  PDF,  (t)  PDF,  (rjdrl 

MTBF(Pir.1.I)  -ElPDF.(t)  /:  PDF,  (r)  dr 

+  PDF,(t)  y*0°°PDF,(T)dr] 


where  E  stands  for  expected  value  of 
*We.)  =  y* PDF, (t) dr-  y"pDF,(T)dr 

R(PiraUel)  =  1  ~  /:  PDF,(r)dr  ♦  J*  PDF,(r)dr 

Step  3  -  For  each  Monte-Carlo  pass,  re¬ 
cord  the  MTBF’s  and  the  R’s. 

Step  4  -  Construct  a  histogram  of  MTBF’s 
and  R’s.  JThe  means  of  the  resulting  PDF’s 
are'KfTBFT'  and  (As  stated  before, 
is  not  really  needed  since  it  can  be  obtained 
directly  from  It,  and  It, .)  The  20th  percen¬ 
tile  of  the  resulting  PDF’s  are  the  respective 
80%  bounds  on  MTBF,  and  R,. 

The  results  of  the  Monte-Carlo  simula¬ 
tion  are  tabulated  in  figure  7-12. 

7.2.6  System  Availability  Interval 
Monte-Carlo  Simulation 

A  Bayesian  Monte-Carlo  simulation  method 
to  estimate  the  lower  availability  bound  at 
confidence  y  for  a  serial  system  of  N  inde¬ 
pendent  components  is  presented  in  this 
paragraph. 

While  a  point  estimate  of  availability  for 
such  a  system  can  be  obtained  by  multiplying 
the  ind:  idual  estimates  of  component  availa¬ 
bility,  such  a  procedure  would  be  incorrect 
if  performed  with  interval  estimates.  That  is: 

N 

as  =£  n  a, 

y  i=i  v 

Where  Asy  represents  a  system  availability 
bound  at  confidence  y,  and  A^,  the  ith  com¬ 
ponent  availability  bound  at  confidence  y. 

An  interval  estimate  of  system  availability 
can  be  obtained  by  Monte-Carlo  simulation 
based  on  data  taken  at  component  level. 
Input  data  consist  of  “T,  the  point  estimate 
of  failure  rate,  ft,  the  point  estimate  of  repair 
rate,  x,  the  number  of  failures,  and  m,  the 
number  of  repairs. 

The  core  of  the  simulation  technique  is  a 
Bayesian  view  which  considers  the  true 
availability  as  a  random  variable  and  synthe¬ 
sizes  its  distribution  g(AlT,  ft.  x,  m)  condi¬ 
tioned  on  the  estimate  or,  more  correctly. 
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Case  # 

Configuration 

Components 

A 

RS 

rl 

(80% 

Confidence) 

MTBFS 

mtbfl 

(80% 

Confidence) 

(1) 

Serial 

Exponential 

Exponential 

(X8755 

0.8525 

8.030 

5.191 

(2) 

Serial 

Exponential 

Normal 

0.9441 

0.9291 

2.655 

2.260 

(3) 

Serial 

Exponential 

Wei  bull 

0.8679 

0.8463 

15.97 

10.55 

(4) 

Parallel 

Exponential 

Exponential 

0.9961 

0.9448 

27.69 

17.65 

(5) 

Parallel 

Exponential 

Normal 

0.9997 

0.9998 

21.67 

10.69 

(6) 

Parallel 

Exponential 

Weibull 

0.9957 

0.9945 

516.1 

277.7 

Figure  7-12.  Results  of  Monte-Carlo  Simulation 


on  the  data  generating  the  estimate.  It  has 
been  shown  that  if  u  and  v  are  independent 
random  variables  having  chi-square  distribu¬ 
tions  with  2x  and  2m  degrees  of  freedom 
respectively,  then  the  ratio  (u/2x)/(v/2m) 
has  an  F  distribution  with  2x  and  2m  de¬ 
grees  of  freedom.  Therefore: 

X  f  c 

p,  p.  ■  2*. 2m 

and  an  upper  confidence  bound  on  X/#r  is 

(1.)  mlF 

\U  J  ^  r2*,2m,l-7 

1-7 

Since  a  lower  confidence  bound  on  availa¬ 
bility  is  given  by 


1  *  (I) 


it  is  apparent  that  sampling  from  Fj,  Jm  is 
equivalent  to  sampling  from  g(AlT,  ft,  x,  m) 
by  the  transformation  shown  above.  It  is  a 


simple  task  to  sample  from  any  desired  F 
distribution,  beginning  with  random  numbers 
R[0,1)  distributed  uniformly  on  the  interval 
zero  to  one,  or  beginning  with  random 
numbers  N[0,1],  distributed  normally  with 
zero  mean  and  unit  variance.  The  trans¬ 
formations  are 

N  =  (y/-2  In  R, )  (Cos  2  ir  R2 ) 

k 

v  =  Z  Nf 

i=  I 

where  v  is  a  chi-square  variate  with  k  degrees 
of  freedom.  Two  chi-square  variates  are 
formed  by  sampling  normal  variates  as  fol¬ 
lows: 

2* 

v  =  Z  NJ 
mi  ' 

and 

2m 

u  =  Z  N? 
mi  ' 

Then  an  F  variate  is  formed  as: 

y  = 
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Note  that  y  is  a  FJlt<Jm  variate  and  v  and  u 
a re  independent  xf*  and  x|m  variates  [81. 
Thus  2x+2m  independent  normal  variates 
must  be  drawn  to  construct  one  F  variate, 
which  is  then  transformed  to  a  sample  value  A 
of  availability  for  the  component  by: 


The  computer  executes  the  above  algorithm 
for  each  component,  stores  the  results,  then 
uses  the  stored  values  to  solve  the  system 
model  for  system  availability.  That  repre¬ 
sents  one  pass  through  the  simulation  pro¬ 
cedure.  Repetition  of  the  process  builds  up  a 
histogram  of  the  sample  values  of  system 
availability  which  approaches  the  shape  of 
g(Ai  %  %  x,  m)  as  the  number  of  passes 
increases.  The  desired  interval  estimate  is 
obtained  simply  by  reading  the  appropriate 
percentile  values. 

When  the  repair  time  variable  is  log- 
normally  distributed,  the  inverse  function 
p-'(u)  =  R  can  be  solved  for  the  coefficient 
a  in  the  denominator  below  by  numerical 
integration  on  each  pass,  then  substituted 
into 


Note  that  a  in  the  denominator  is  the  coef¬ 
ficient  of  figure  E-9  of  Appendix  E,  not 
availability. 

Or,  much  more  simply,  a  log-normal  inter¬ 
val  can  be  formed  by  sampling  the  X  and  p 
variables  independently  and  forming  their 
quotient. 

X  ~  tx’/2x 


Where  A  is  the  log-normal  distribution  func¬ 
tion,  M  is  the  mean  of  the  corresponding 
normal  distribution,  v  is  the  reciprocal  of 
geometric  mean  corrective  maintenance  time 
and  z  is  the  standard  normal  random  variate. 

ta(i) 

“~exp  £  '"(?) 


As  in  Appendix  C,  o2  =  o*  is  assumed. 
Successive  samples  of  X  and  p  are  inserted 
into  the  expression  for  A  and  its  histogram 
is  built  up.  Operating  arithmetically  with 
limits  computed  for  each  parameter  separate¬ 
ly  will  give  a  much  larger  “at  least”  type  in¬ 
terval.  For  example,  if  AL  is  computed  using 
80%  limits  on  X  and  p,  the  resulting  limit 
defines  a  64%  interval  for  A. 

It  is  easy  to  read  from  the  histogram  a 
variety  of  relevant  statistics  with  standard 
errors  which  are  entirely  under  control, 
since  they  depend  only  on  the  number  of 
passes  n.  Specifically,  one  can  read  the  mean 
or  expected  value,  the  mode  or  maximum 
likelihood  value,  the  median  or  fifty  percent 
confidence  limit,  any  desired  percentiles  in 
order  to  construct  one-sided  or  two-sided 
interval  estimates,  and  the  range-  The 
standard  error  of  each  of  these  estimates, 
except  the  mode,  are  computable  by  refer¬ 
ence  to  Kendall  [9].  Briefly,  the  standard 
errors  are,  for  the  mean  X, 

o-K-  -  oA/^rr 

where  n  is  the  number  of  passes,  and  for  the 
pth  and  (1 00-p)th  percentiles, 

°Ap  *  4>ok 

Kendall  tabulates  a  few  values  of  the 
ratio  0,  which  is  symmetrical  about  the 
median.  When  simulating  is  done  often,  it 
is  useful  to  fit  a  smooth  curve  (Figure  7-1 3) 
and  express  0  as  a  function  of  the  desired 
percentile  [7],  The  standard  error  of  the 
mode  is  available  with  somewhat  greater 
effort  by  use  of  Yasukawa’s  method  [10J. 

♦  -  1.93637  -  2.86403P  ♦  2.86403p> 


PERCENTILE,  p 

Figure  7-13.  Standard  Error  of  a  Percentile  as  a 
Multiple  of  Standard  Error  of  Mean 


7-18 


NAVSEA  OD  29304B 


7.3  References 

1.  NAVORD  OD  29304/ Addendum  - 
Statistical  Exposition  of  the  Guide 
Manual  for  Reliability  Measurement 
Program,  1 1/15/67. 

2.  Kozlov,  B.  A.  and  Ushakov,  1.  A.,  Relia¬ 
bility  Handbook,  International  Series  in 
Decision  Processes,  Holt  Rinehart  and 
Winston,  Inc.,  1970. 

3.  Mann,  Schafer,  and  Singpurwalla,  Meth¬ 
ods  for  Statistical  Analysis  of  Relia¬ 
bility  and  Life  Data,  John  Wiley  and 
Sons,  1974. 

4.  Mann,  N.  R.,  and  Fertig,  K.  W.,  Approxi¬ 
mately  Optimum  Confidence  Bounds  on 
the  Reliability  of  a  Logically  Coherent 
System,  Rocketdyne  Research  Report 
RR  72-02,  Rocketdyne,  Canoga  Park, 
CA. 

5.  Mann,  N.  R.  and  Grubbs,  F.  E.,  Approxi¬ 
mate  Optimum  Confidence  Bounds  for 


System  Reliability  Based  on  Component 
Test  Data,  Technometrics,  Vol.  16, 
No.  3, 8/1974. 

6.  Myhre,  J.  M.,  Rosenfeld,  A.  M.  and 
Saunders,  S.  C.,  Determining  Confidence 
Bounds  for  Highly  Reliable  Coherent 
Systems  Based  on  a  Paucity  of  Com¬ 
ponent  Failures. 

7.  Fagan,  T.  and  Wilson,  M.,  Monte  Carlo 
Simulation  of  System  Reliability,  Pro¬ 
ceedings  of  Association  for  Computing 
Machinery  Conf.,  pp.  289-293,  1968. 

8.  Segal,  R.,  Generation  of  Random  Num¬ 
bers  for  Monte  Carlo  Simulations,  GE 
TIS  Report  65SD231 ,  April  1965. 

9.  Kendall,  M.,  The  Advanced  Theory  of 
Statistics,  Vol.  1,  Charles  Griffith  and 
Co.,  London,  1943-1946. 

10.  Yasukawa,  K.,  On  the  Probable  Error 
of  the  Mode  of  Skew  Frequency  Dis¬ 
tributions,  Biometricka  18,  pp.  263-292. 


7-19/7-20 


NAVSEAOD  29304B 


Section  8 

RELIABILITY  DEMONSTRATION 


8.0  INTRODUCTION 

Reliability  demonstration  is  appropriate  for 
newly  designed  equipment,  equipment  that 
has  been  modified,  and  equipment  that  is  of 
unproven  reliability  or  of  previously  unac¬ 
ceptable  reliability.  It  is  best  performed  at  the 
highest  feasible  assembly  level  using  equip¬ 
ment  as  close  to  the  production  configuration 
as  possible. 

Selection  and  scheduling  of  the  demonstra¬ 
tion  test(s)  should  be  an  integral  part  of 
the  Integrated  Test  Program  (ITP),  with 
completion  of  the  demonstration  tests  and  if 
applicable,  retests,  to  occur  prior  to  starting 
the  production  program. 

This  section  provides  guidance  for  ap¬ 
proaching  reliability  demonstration  in  an 
orderly  and  timely  way  by  giving  stepwise 
information  on  the  conduct  of  a  reliability 
demonstration  program  from  categorization 
of  equipments  for  demonstration  through 
reporting  the  final  results  of  the  demonstra¬ 
tion  test. 

Minimum  contents  of  the  demonstration 
test  plan  are  provided  (§  8.2.2)  along  with  a 
“road  map”  of  the  reliability  demonstration 
process  (Figure  8-1).  Additional  details  on 
content  requirements  of  the  demonstration 
test  plan  are  provided  in  §  10.1.1.2  and 
NAVORD  OD  42282. 

A  brief  discussion  on  the  background  of 
reliability  demonstration  is  also  provided. 

8.1  BACKGROUND 

The  validity  of  any  demonstration  depends 
on  the  statistical  regularity  of  the  process 
observed,  that  is,  on  the  variability  of  the 
sample  system(s)  in  repeated  tests  and  opera¬ 
tions,  similarity  of  later  systems  to  those  com¬ 
prising  the  sample,  control  of  test  or  opera¬ 
ting  conditions,  accuracy  of  performance 
measurements,  integrity  of  test  results  and 
consistent  definition  of  failure  and  success 


in  the  tests.  It  also  depends  to  some  extent  on 
how  well  the  demonstration  model  reflects 
the  actual  factors  that  influence  system  per¬ 
formance. 

The  measure  of  reliability  most  applicable 
to  a  particular  system  and  mission  should  be 
the  specified  reliability  parameter  and  the 
basis  for  the  reliability  demonstration.  For 
systems  having  constant  failure  rates  (ex¬ 
ponentially  distributed  failure  times)  the  cus¬ 
tomary  measures  are  MTBF  or  failure  rate. 
Other  measures,  such  as  probability  of  failure 
or  reliability,  are  customary  for  one-shot 
devices,  cyclic  equipment,  and  systems  with 
failure  times  not  exponentially  distributed. 

Traditionally,  reliability  demonstration  has 
been  implemented  as  hypothesis  testing. 
When  this  approach  is  taken,  it  is  possible  to 
estimate  the  sample  size  necessary  to  achieve 
reliability  demonstration  with  the  agreed 
upon  risks.  Although  interval  estimation  and 
hypothesis  testing  are  related,  interval  estima¬ 
tion  cannot  address  the  determination  of 
sample  size  to  satisfy  specified  decision  risks. 
These  risks  are  commonly  referred  to  as  the 
producer’s  risk,  a  (the  probability  of  a  test 
rejecting  an  item  which  complies  with  the 
design  objective),  and  the  consumer’s  risk.  0 
(the  probability  of  a  test  accepting  an  item 
which  has  the  minimum  reliability).  These 
two  specified  reliability  parameters,  design 
objective  and  minimum  reliability,  are  equiv¬ 
alent  to  specifying  MTBFs  (0O  and  0,)  for 
equipments  following  an  exponential  dis¬ 
tribution.  The  relationships  of  0O  and  0,  to 
a  and  0  are  represented  in  figure  8-2. 

When  demonstration  consists  of  hypothesis 
testing,  the  reliability  measure  of  interest  is 
treated  as  a  constant  system  parameter  not 
known  with  precision.  The  hypothesis  test¬ 
ing  approach  is  useful  when  an  adequate  test 
sample  is  available  and  when  schedules  permit 
the  extended  testing  typically  needed.  More¬ 
over,  hypothesis  testing  is  conceptually  valid 
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Figure  8*1.  Reliability  Demonttntion  Proceas 
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when  the  system  design  is  fixed  and  no 
product  improvement  program  is  underway. 
Military  specifications  such  as  MIL-STD- 
78111]  embody  the  hypothesis  testing  ap¬ 
proach  to  demonstration. 

M1L-STD-781  [  1  ]  applies  to  devices  having 
constant  failure  rates  (exponentially  distrib¬ 
uted  times  between  failure).  It  provides  fixed 
length  tests  based  on  the  chi-square  interval 
method  and  sequential  (variable  length)  tests 
based  on  Wald’s  sequential  testing  method.  In 
the  C  revision  of  MIL-STD-781  [  1  j ,  fixed 
length  tests  vary  in  length  from  1.1  to  45 
times  the  minimum  specified  MTBF,  and  have 
a  and  0  risks  ranging  between  10  and  30  per¬ 
cent.  Sequential  tests  select  between  two 
alternate  hypotheses  but  do  not  provide  an 
estimate  of  the  MTBF  expected  in  service. 
Thus,  sequential  tests  are  not  directly  ap¬ 
plicable  when  such  estimates  are  needed,  how¬ 
ever,  the  data  from  the  tests  can  be  used  in 
other  models  to  obtain  the  MTBF  estimate. 
The  total  test  time  (hence  also  cost)  of  a 
sequential  demonstration  is  a  random  (though 
bounded)  variable.  Usually,  however,  sequen¬ 
tial  tests  require  less  time  to  complete  than 
fixed  length  tests  of  equal  power.  The  highest 
risk  sequential  test  plan  of  MIL-STD-781C]  1  ] 
requires  from  1.72  to  4.5  times  the  minimum 


acceptable  (lower  test)  MTBF  (0,).  Both 
demonstration  approaches  of  MIL-STD- 
781  HI  are  discussed  in  §  8.3. 

When  a  standard  test  of  MIL-STD-781  {11 
is  invoked  by  specification,  it  is  general  prac¬ 
tice  to  specify  the  minimum  acceptable  reli¬ 
ability  (Rl  *)  or  MTBF  (0, ),  the  test  selected, 
and  the  environmental  level  at  which  testing 
will  be  performed.  This  constitutes  a  com¬ 
plete  specification  because  the  test  selected 
contains  the  discrimination  ratio  (0o/0, )  and 
the  a  and  0  risks. 

Trade-off  studies,  based  on  such  things  as 
cost,  schedule,  test  samples,  need  to  deter¬ 
mine  design  stability  and  risk,  should  be  con¬ 
ducted  to  determine  the  type  of  test  to  be 
selected. 

A  more  recent  concept  of  demonstration, 
which  is  gaining  increasing  favor,  applies  to 
systems  undergoing  progressive  modifications 
which  are  taken  in  response  to  design  im¬ 
provements  and  corrective  actions  for  early 
failures.  Reliability  and  availability  measures 
are  treated  as  variable  system  properties  that 
improve  or  grow  primarily  as  a  result  of 
progressive  weeding  out  of  failure  mechanisms 
by  corrective  actions  as  time  and  failures  ac¬ 
cumulate.  A  basic  growth  model  was  de¬ 
scribed  empirically  in  1964  by  J.T.  Duane[2] , 
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has  subsequently  gained  wide  acceptance,  and 
is  reflected  in  standards  such  as  MIL-STD- 
1635(3]  for  test-analyze-and-fix  (TAAF)  pro¬ 
grams.  Statistical  limitations  of  Duane’s 
model  were  addressed  by  Dr.  Lawrence 
Crow]4] .  Dr.  Crow  derived  more  rigorous 
estimates  of  the  model’s  parameters,  which  is 
discussed  in  §  5.4.4.  Additional  detail  is  pro¬ 
vided  in  draft  M1L-STD-781  D[  1  ] . 

8.2  DEMONSTRATION  TESTING 

Reliability  demonstration  testing  consists 
of  three  major  steps:  Equipment  Categoriza¬ 
tion,  Test  Plan  Development,  and  Test  Im¬ 
plementation.  Figure  8-1  outlines  the  dem¬ 
onstration  process  and  the  following  par¬ 
agraphs  provide  discussion  of  equipment 
categories,  test  criteria,  and  test  plans. 

8.2.1  Procedure 

The  first  step  in  the  demonstration  process 
is  the  determination  of  the  equipment  for 
which  demonstration  is  required.  This  is  often 
clearly  defined  in  contractual  specifications. 
Reliability  demonstration  equipment  cat¬ 
egories  are  indicated  in  figure  8-3.  Equipment 
included  in  category  A  should  normally  be 
subjected  to  MIL-STD-781[1]  testing.  Equip¬ 
ment  in  category  B  (e.g.  one-shot  devices)  re¬ 
quires  alternate  demonstration  plans  which 
meet  the  intent  to  demonstrate  required 
reliability  prior  to  the  production  program. 
Equipment  for  which  there  are  sufficient 
patrol  data  to  document  and  substantiate 
that  the  required  specified  reliability  has  been 
achieved  may  fall  in  category  C.  Equipment 
in  category  C  should  not  be  required  to  have 
a  formal  demonstration  of  reliability. 

The  second  step  in  the  demonstration  proc¬ 
ess  is  preparation  of  the  Reliability  Dem¬ 
onstration  Test  Plan  (§  10.1.1.2).  In  all  cases 
this  plan  must  discuss  at  least : 

Quantitative  requirements,  §  8.2.2. 1, 

Test  environment,  §  8.2. 2.2, 

Test  article  selection,  §  8. 2.2.3, 

Test  type  and  length,  §  8. 2.2.4, 

Failure  criteria,  §  8. 2. 2.5,  and 
Sample  size,  §  8.2. 2.6. 

Test  methods  are  described  in  §  8.3  for 
mature  (category  A)  systems  and  equipments 
with  constant  failure  rate.  Non-exponential 
reliability  demonstration  procedures  for 


Equipment 

Category 

Definition 

A 

Equipment  to  which  M1L-STD- 
781(1]  testing  (§  8.3)  applies: 

1 .  Equipment  contract  re¬ 
quires  test 

2.  Exponential  distribution 
applies 

3.  New  Design 

4.  Modified  Design 

5.  Existing  Design  with  un¬ 
proven  or  previously  un¬ 
acceptable  reliability 

B 

Equipment  to  which  non-MIL- 
STD-781  [1]  testing  (§  8.4) 
applies: 

1 .  Equipment  contract  re¬ 
quires  test 

2.  Exponential  distribution 
does  not  apply 

3.  New  Design 

4.  Modified  Design 

5.  Existing  Design  with  un¬ 
proven  or  previously  un¬ 
acceptable  reliability 

C 

Equipment  now  in  use  aboard 
submarines  which  has  exhibited 
reliability  levels  equal  to  or  ex¬ 
ceeding  requirements  of  the 

TOG  or  the  subsystem  spec. 

Figure  8-3.  Reliability  Demonstration 
Equipment  Categories 


mature  (category  B)  systems  and  equipments 
are  covered  in  §  8.4.  A  reliability  growth 
demonstration  procedure  is  discussed  in 
§  8.5. 

The  third  step  in  the  demonstration  process 
is  to  conduct  the  test  in  accordance  with  the 
approved  plan  and  report  the  results.  Report¬ 
ing  is  discussed  in  §  10.1.2.3. 

8.2.2  Development  of  Test  Plan/Test  Criteria 

8.2.2. 1  Quantitative  Requirements 

Design  requirements  for  reliability  are  de¬ 
rived  from  the  top  level  requirements  for  the 
system  which  contains  the  equipment  to  be 
tested.  Such  requirements  are  based  on 
strategic  (mission)  objectives  and  the  system 
reliability  objective  as  apportioned  to  lower 
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indenture  levels.  This  process  establishes 
meaningful  reliability  objectives  for  design 
to  achieve  the  weapon  system  objectives. 

Minimum  acceptable  reliability  objectives 
are  established  by  the  user  (operator)  and  are 
based  on  operational  constraints  such  as  avail¬ 
ability,  reliability  and  logistics  capabilities 
which  affect  the  user’s  ability  to  successfully 
accomplish  the  mission.  Minimum  acceptable 
reliability  is  understood  as  the  test  require¬ 
ment  to  be  demonstrated  with  statistical  con¬ 
fidence. 

The  FBMWS/SWS  and  constituent  subsys¬ 
tems  reliability  design  objectives  are  specified 
in  the  pertinent  TOG(  11].  Subsystem  values 
are  subsequently  allocated  to  lower  level  sys¬ 
tem  elements,  as  required,  and  documented  in 
product  specifications.  The  minimum  accept¬ 
able  reliability  for  the  FBMWS/SWS,  estab¬ 
lished  by  the  CNM[12],  is  also  allocated  to 
lower  system  elements  as  required  for  dem¬ 
onstration  test  planning.  Realistic  values  for 
a  and  0  are  determined  by  coordinated  con¬ 
sideration  of  producer/consumer  needs,  test 
costs,  and  other  program  constraints,  e.g. 
schedules.  A  complete  specification  for 
reliability  demonstration  is  established  when 
quantitative  values  are  assigned  to  the  four 
parameters  shown  in  figure  8-4. 


Quantities  Specified 

Design 

Objective 

Producer's 

Risk 

Minimum 

Reliability 

Consumer's 

Risk 

R* 

(TOG  or  Sub¬ 
system  Spec) 

Or 

(Coordinated) 

K 

(NAVM  AT) 

0 

(Coordinated) 

Figure  8-4.  Complete  Reliability  Specification 


8.2.2.2  Test  Level 

Results  of  the  mission  analysis  should  be 
used  to  determine  the  environmental  condi¬ 
tions  and  duty  cycle  to  be  proposed  in  the 
demonstration  plan.  If  a  M1L-STD-781I1J 
test  is  to  be  performed,  the  results  of  the  mis¬ 
sion  analysis  enables  the  selection  of  the  ap¬ 
propriate  test  cateogry  {e.g.,  Category  3A, 
Shipboard  Equipment  Sheltered  ofMIL-STD- 
781C|  1]  |  or  justification  fora  modified  test 
level.  If  the  mission  analysis  indicates  that  the 


MIL-STD-781  ( I  ?  categories  must  be  mod¬ 
ified,  then  a  unique  test  level  shall  be  estab¬ 
lished  and  documented  in  the  test  plans  and 
procedures.  The  Reliability  Demonstration 
Test  Plan  must  address  any  differences  be¬ 
tween  test  environment  and  mission  environ¬ 
ments  and  the  effect  of  these  differences  on 
demonstrated  reliability. 

8.2.2.3  Selection  of  Test  Article 

Reliability  demonstration  tests  should  be 
performed  using  samples  of  intended  produc¬ 
tion  (i.e.,  manufactured  to  production  draw¬ 
ings  using  production  tooling  on  production 
lines,  and  inspected  and  tested  to  approved 
procedures  using  production  test  and  mea¬ 
surement  equipment). 

The  demonstration  plan  should  clearly 
specify  the  configuration  which  will  enter  the 
test.  The  test  plan  should  identify  any  dif¬ 
ferences  between  the  production  configura¬ 
tion  and  the  test  sample,  and  the  effect  of 
these  differences  on  demonstrated  reliability. 

8.2.2A  Test  Plan  Selection 

For  equipment  in  Category  A  (Figure  8-3) 
a  test  plan  is  selected  from  MIL-STD-781  [  1  ] . 
It  is  recognized  that  when  values  specified  for 
R*  and  R*  (Figure  8-4)  are  converted  to  0O 
and  0,,  they  will  not,  in  general,  yield  dis¬ 
crimination  ratios  (0o/0,)  which  correspond 
exactly  to  the  ratios  contained  in  MIL-STD- 
781  [  1  ] .  Therefore,  some  adjustments  may  be 
required  in  selecting  the  test  plan  which  best 
satisfies  the  requirement.  Since  MIL-STD- 
781]  1]  has  variable  length  [probability  ratio 
sequential  test  (PRST)]  and  fixed  length  test 
plans,  trade-off  studies  should  be  conducted 
to  determine  the  type  of  test  which  should  be 
performed.  Bases  for  the  trade-off  studies 
include  cost,  schedule,  need  to  determine 
design  stability,  and  risk.  PRST  plans  have 
generally  been  preferred  to  fixed  length  test 
plans.  Category  B  equipment  is  discussed  in 
§  8.4. 

8.2.2. 5  Failure  Criteria 

Failure  criteria  for  each  equipment  to  be 
tested  during  the  demonstration  test  must  be 
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included  in  the  demonstration  plan.  The  fail¬ 
ure  categories  defined  in  MIL-STD-781  [  1  ] 
should  be  used  to  guide  this  effort.  Particular 
consideration  should  be  given  to  the  recogni¬ 
tion  and  treatment  of  procedural  errors  (o.g., 
operator  induced,  inadequate  documentation) 
and  software  errors.  All  failures  are  relevant 
failures  unless  proven  otherwise. 

8.2.2.6  Sample  Size 

Determine  the  quantity  of  each  equipment 
to  be  used  in  the  reliability  demonstration 
test  program.  The  sample  size  required  should 
be  based  upon  the  expected  duration  of  the 
test  for  each  equipment,  required  completion 
date,  possible  number  of  test  articles  avail¬ 
able,  and  test  facilities  capability.  It  should 
also  consider  statistical  variability  (i.e.,  it  is 
desirable  to  test  a  sample  large  enough  to  of¬ 
fer  some  assurance  that  the  tested  items  are 
representative  of  the  population)  and  the 
need  to  establish  design  stability  [the  effects 
of  time  and  environments  (not  considered 
life  tests)] .  Equipments  characterized  by  long 
operating  times  and  comparably  high  times 
between  failures  should  be  tested  in  quantity 
to  obtain  the  maximum  amount  of  test  time 
and  information  in  the  shortest  calendar  time. 
Note  that  MIL-STD-781 1 1  ]  requires  that 
each  test  sample  operate  at  least  one  half  the 
average  operating  time  of  all  equipments  on 
test.  A  frequent  drawback  in  reliability  dem¬ 
onstration  testing  is  that  a  sample  size  o.  one 
is  often  used.  Better  test  planning  could 
alleviate  this  problem  in  many  cases. 

8.3  MIL-STD-781  [  1  ]  DEMONSTRATION 
TESTS 

The  test  methods  described  in  this  par¬ 
agraph  are  for  equipment  that  exhibit  an 
exponential  distribution  of  time-to-failure. 

8.3.1  Hypothesis  Testing  (Wald’s  Probability 
Ratio  Sequential  Method) 

Sequential  tests  permit  one  of  three  deci¬ 
sions  to  be  made  after  each  observation- 
accept  the  test  hypothesis,  reject  the  test 
hypothesis  and  accept  an  alternative  hypoth¬ 
esis,  or  continue  testing. 

Typically  the  test  hypothesis  is  0  ( the  true 
MTBF)  =  0O ,  (the  design  value  of  MTBF).  The 


alternative  hypothesis  is  0  =  0,  (the  min¬ 
imum  specified  value).  If  0  =  0O  the  series  of 

observations  x, ,  xa ,  x} _ x„  is  distributed 

with  probability  density, 

p0(n)  =  llf(x,0o) 

where  f(x,  0O)  is  the  density  function  appli¬ 
cable  to  the  variable  x  with  parameter  0O. 
Likewise,  if  0  =  0, , 


p,(n)=  nf(x,0,) 

Thus  Wald’s  method  is  applicable  to  any  at¬ 
tribute  or  variable  the  density  function  of 
which  is  known  or  assumable  a  priori. 

After  each  observation  the  ratio  p,(n)/ 
p0(n)  is  tested  in  the  inequality 

0  <Pi(fl)<  1-0 
1-a  p0(n)  a 

where  a  and  0  are  type  I  and  type  II  risks 
respectively.  Testing  is  continued  until  p,  (n )/ 
p0(n)  fails  to  satisfy  the  inequality.  The  test 
always  converges  to  a  decision  and  generally 
requires  about  half  the  test  time  of  a  fixed 
length  demonstration  of  equal  power. 

8.3. 1.1  Accept  and  Reject  Criteria 

When  t  is  time  to  failure  and  f(t,0)  is  an 
exponential  density  function,  the  following 
test  values  are  computed. 


-In 


Accept  y  intercept  (+h0)  *  +■ 


l-o 


Reject  y  intercept  (— h, )  =  - 


1  1 
ei~T0 

In  — 
a 

r~r 


(8-1) 


(8-2) 


Slope 


(s)  = 


In  (0o/0,) 

-L  L 

'e0 


(8-3) 
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The  equations  for  the  accept  and  reject 
lines  are  respectively : 


T  =  h0  +  sx 

(8-4) 

T  =  -h,  +  sx 

(8-5) 

After  total  unit  test  time  T  with  x  failures 
observed,  decisions  are  made  in  accordance 
with  the  criteria 

Reject  if  T  <(-h,  +  sx) 

Accept  if  T  >  (h0  +  sx) 

Continue  testing  if  (— h,  +  sx)  <  T  <  (h0  +  sx). 

The  expected  total  test  time  to  a  terminal 
decision  point  is: 


Using  equation  8-3: 


s  = 


+20 ,  ln2  where  0o  -  20, 


26, 


s= +1.3863 6, 

Therefore  the  accept  line  is 
T  =  h0  +  xs 
=  1.3863(x+2)0, 
and  the  reject  line  is 

T  =  -h,  +  xs 


=  1.3863(x-2)0, 


E[T]=flc 


(1-a)  In  ~  +  aln  — 
’  l-o _ a 

ln(90/fl1)+  1-(V».) 


(8-6) 


MIL-STD-781[1]  plots  failures  on  the  y 
axis  and  test  time  on  the  x  axis.  Solving  equa¬ 
tion  8-4  and  8-5  for  x 


6  =  6, 


E[T1=0. 


/Jln^+O-flln^ 

ln(V®i)-l+(*|/«o) 


(8-7) 


T-h0 

x  -  (8-8) 

s 

T  +  h, 

x  = -  (8-9) 

s 


Figure  8-5  is  an  example  of  a  sequential  plan 
with  discrimination  ratio  ( 0o/0 ,)  =  2.0:1  and 
decision  risk  (a  =  0  =  20  percent).  The  plan  is 
obtained  as  follows: 

From  equation  8- 1 : 


1-.20 

+h0  - - - - =  +1.3863(26, ) 

since  60  =  26, 


and  inserting  the  values  for  h0 ,  h,  and  s  gives 
T-h0  T 

x  = - - -  -  2 

s  1.38630, 

T  +  h,  T 

x  * - = - +  2 

s  1.38630, 

This  information  is  plotted  in  figure  8-6. 
Figure  8-7  is  sequential  test  plan  1VC  of 
MIL-STD-781C[1],  This  is  similar  to  figure 
8-6  except  that  the  equation  for  the  reject 
line  used  in  the  M1L-STD  is 


+h0  =+2.77260, 
Similarly,  from  equation  8-2: 


T 

1.38630, 


+  1.5 


+h,  =+2.77260, 


(i.e.,  the  reject  line  dropped  0.5  failures)  and 
truncation  lines  have  been  added. 
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DISCRIMINATION  RATIO  2:1 
DECISION  RISKS  20% 


6 


S  ]- 


4  h- 


3  h 


CONTINUE 

TO 

TEST 


1  P* 


201  30,  40, 

TEST  TIME 

Figure  8-6.  Sequential  Test  Plan,  Failures  Versus  Test  Time 
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DECISION  RISKS  (NOMINAL)  20  PERCENT 

DISCRIMINATION  RATIO  2.0  :  1 
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Figure  8-8.  Thorndike  Chart 


MIL-ST')  "81  does  not  indicate  the  proce¬ 
dure  used  .11  developing  these  truncation  lines. 
A  method  for  truncation  is  provided  in 
NAVORD  OD  41 146[8]  and  in  §  8.3. 1.2. 

8.3. 1.2  Truncation  Criteria 

Figure  8-8  is  a  Thorndike  Chart  which  is 
used  to  establish  truncation  lines  according 
to  the  following  steps: 

1 .  Define  “Probability  of  x  or  Less  Fail¬ 
ures”  ordinates  for  Figure  8-8  corresponding 
to  (1  -a)  and  (3.  For  this  example  the  OTdi- 
nates  will  be  0.8  and  0.2  respectively. 

2.  Determine  from  figure  8-8  abscissa 
values  of  tip  corresponding  to  the  ordinates 
of  step  1  and  values  of  x  (number  of  failures). 
Enter  these  values  in  a  table  (figure  8-9  for 
the  example)  starting  with  x  =  0. 

3.  Calculate  the  ratio  of  np(I.o)/npfl  for 
each  value  of  x  used  in  step  2  until  you  find 
the  smallest  value  that  will  give  a  ratio  of 
nP( i  -a  )/nP*i  ,hat  is  greater  than  0,  /0o  used  in 
the  sequential  test  plan  being  considered.  In 


0 

1 

2 

3 

4 

5 

6 

1 

7 

"ii«i 

o:2 

0  80 

1.50 

2.30 

3.00 

3.80 

4.60  ; 

5  60 

1  50 

3.00 

4  20 

5.40 

660 

8  00 

9.00  ' 

10  00 

"Vi-.) 

0  14b 

0267 

0.357 

0  426 

0455 

0475 

0  51 1 

0.560 

Figure  8-9.  Failures  Required  for  Truncation 

this  case  0,  !6a  =  .50  and  the  corresponding 
value  of  x  is  6  failures.  (See  figure  8-9,  value 
0.51 1). 

4.  Truncate  the  test  at  the  number  of  fail¬ 
ures  (x0  1  equal  to  one  plus  the  value  deter¬ 
mined  in  step  3.  (In  the  example  x0  =  7  fail¬ 
ures). 

5.  Test  time  is  truncated  at  x0  times  the 
slope  of  the  accept/reject  lines,  in  this  exam¬ 
ple  7  x  1 .38630,  or  9.7 0,.  Figure  8-7  shows 
that  MIL-STD-781 1 1 )  truncated  this  test  at 
9.740.  and  7.1  failures. 
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Another  criteria  sometimes  used  for  trunca¬ 
tion  is  to  use  three  times  the  number  of 
failures  required  in  a  fixed  length  test 
(§  8.3.2)  with  the  same  risks. 

8.3.2  Chi-Square  Method  (Fixed  Length 
Tests) 


which  may  be  described  by  the  x1  distribu¬ 
tion: 

P(x)  =  P(Xj  x  +  2  >  2Xt) 

where  2x+2  is  the  number  of  degrees  of 
freedom  of  the  x2  variate. 


MIL-STD-781[1]  has  nine  fixed  lengths 
and  three  additional  high  risk  fixed  lengths 
tests.  The  chi-square  method  can  be  used 
when  other  fixed  length  test  plans  are  to  be 
developed  for  reliability  demonstration.  Spec¬ 
ifications  such  as  those  in  §  8.1  or  figure 
8-4  can  be  used  to  determine  these  tests. 

Demonstration  consists  of  collecting  suffi¬ 
cient  test  or  operational  data  to  accept  or 
reject  statements  about  the  levels  of  the 
parameters;  thus  it  is  basically  a  hypothesis 
testing  procedure.  A  demonstration  plan  can 
be  developed  from  the  applicable  statistical 
formula  for  interval  estimation.  The  planning 
task  consists  of  determining  combinations  of 
test  time,  sample  size  and  the  maximum 
number  of  failures  which  will  satisfy  the 
hypothesis  test. 

The  chi-square  method  applies  when  all  of 
three  conditions  are  met: 

1 )  Demonstration  is  performed  at  the  same 
assembly  level  at  which  interval  estimates  are 
to  be  made  (i.e.,  the  entire  device  is  tested  as 
a  unit). 

2)  The  device  is  not  cyclic  or  “one  shot”, 
but  operates  more  or  less  continuously  in 
time,  so  that  failure  rate  or  MTBF  is  the 
parameter  of  interest. 

3)  The  device  is  described  by  the  exponen¬ 
tial  failure  density  function  f(t',X)  =  Xe*M, 
(t  >  0,  X  >0)  (i.e.,  the  device  has  a  constant 
failure  rate  X). 

Given  that  the  foregoing  conditions  are 
met,  the  best  (point)  estimate  of  failure  rate 
is  T  =  x/t  where  t  is  the  sum  of  the  operating 
times  accumulated  by  all  devices  in  the  tests 
and  x  is  the  number  of  failures  observed. 

For  a  test  with  a  fixed  truncation  time, 
where  t  units  of  operating  time  are  accu¬ 
mulated,  it  can  be  shown  in  (6]  that  the 
probability  of  observing  x  or  fewer  failures 
is  given  by  the  cumulative  Poisson  distribu¬ 
tion 


P(x)  = 


*  (Xt)* 
kT0  k! 


g*  X  ( 


Therefore,  if  X  >  Xjm  JX+2/2t,  then 
P(x)  <  0.  Here  xV*:i«*a  is  the  100  F" 
percentage  point  of  the  xJ  distribution  with 
2x+2  degrees  of  freedom  and  the  entire  ex¬ 
pression  is  a  null  hypothesis  which  we  seek 
to  test.  Having  observed  x  failures,  we  may 
then  associate  100  (1-0)  percent  confidence 
with  the  alternative  hypothesis  that 


X< 


X*g:2  x  +  2 
2t 


(8-10) 


Epstein  (7]  has  shown  that  equation  8-10 
is  a  one-sided  confidence  interval  on  X.  It  is 
customary  to  define 

XJ(l-0):2x  +  2  _  . 


as  a  100  (1-0)  percent  upper  confidence  limit 
on  X. 

If  MTBF  (0)  is  the  parameter  of  interest, 
the  corresponding  one-sided  interval  is 

0> - - -  (8-11) 

XJ(l-0):2x  +  2 

and  from  equation  8-11  we  define: 


v2 

X  (l-«:2x  +  2 

as  a  100(1-0)  percent  lower  confidence  limit 
on  6. 

If  reliability  for  a  mission  of  length  T  is  to 
be  demonstrated,  we  define 

R(M)  (T)  =  e'*«-«T  =  e  ‘  T '*<•-« 

as  a  100(  1-0)  lower  confidence  limit  on  R(T). 

In  computing  upper  confidence  limits  on 
X,  or  lower  limits  on  9,  the  degrees  of  free¬ 
dom  are  2(x+l)  since  the  test  time  is  fixed  in 
advance. 
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The  test  time  t  may  be  accumulated  in  a 
replacement  or  non-replacement  test,  or  from 
combined,  interrupted  or  sequential  tests. 
Mathematically,  sample  size  and  test  duration 
are  directly  exchangeable.  Ail  that  is  required 
is  that  t  =  Ztj,  the  sum  of  operating  times  of 
all  items  tested,  and  that  x  =  Exp  the  total 
number  of  failures  observed  in  the  tests. 
Demonstration  is  complete  when  the  interval 
estimate  encompasses  d£,  the  minimum  ac¬ 
ceptable  MTBF,  or  R^,  the  minimum  accept¬ 
able  reliability. 

From  an  engineering  viewpoint  however,  it 
is  desirable  to  test  a  sample  large  enough  to 
offer  some  assurance  that  the  tested  items  are 
representative  of  the  population.  It  is  also 
desirable,  though  not  always  feasible,  to  con¬ 
tinue  testing  long  enough  to  show  that  wear- 
out  (time  and  environment)  effects  are  not 
significant  in  the  period  during  which  reliabil¬ 
ity  is  of  concern.  Both  wearout  and  infant 
mortality  effects  tend  to  prevent  a  system 
from  exhibiting  the  exponential  distribution 
of  times  between  failures  on  which  the  chi- 
square  method  depends. 

Example.  Reliability  Demonstration  with 
Interval  Estimation.  Chi-Square  Method 

Given  t  =  300  hours,  x  =  3  failures,  0  -  .20 
Find:  ?,0L,fl(l)  and  RL  (1) 


8.4  NON-EXPONENTIAL  RELIABILITY 
DEMONSTRATION 

Two  procedures  for  reliability  demonstra¬ 
tion  are  presented  in  this  paragraph;  one  is 
taken  from  NAVORD  OD  41 146[8]  and  the 
other  is  based  on  a  binomial  method. 

8.4.1  NAVORD  OD  41146  (Sequential 
Ratio  Test)  Method 

Procedures  for  demonstration  (i.e.,  accept¬ 
ance  of  a  hypothesis,  H0,  or  the  alternate 
hypothesis,  H,)  when  the  exponential  as¬ 
sumption  is  not  valid  are  found  in  NAVORD 
OD  41146(8).  Equipment  designs  that  em¬ 
ploy  redundancy  and  those  which  must  be 
judged  on  an  attribute  basis  are  in  this  class 
These  procedures  are  illustrated  by  exam¬ 
ple  for  a  complete  reliability  specification 
(figure  8-10)  for  a  hypothetical  launcher 
system. 

Step  1.  Establish  the  Hypothesis  and  the 
alternate. 

In  this  example  the  design  requirement  is 
0.98.  The  hypothesis  is  therefore 

H0  :Q0  <  0.02  at  t  «  T 

where 

Q0  <(1-R0)  =  (1-.98) 


*  t  300 

*=r~ 


1 00  hours 


and  using  equation  8-1 1  and  figure  E-l : 


eL=0.so 


2t 

XJ.80:8 


(2)  (300) 
1 1 .030 


=  54.397  hours. 


Indicating  that  the  MTBF  estimate  is  100 
hours  and  its  80%  lower  bound  is  54.397 
hours.  Continuing, 


£(!)=  e-^<"  =e-oiooo  =0.9900 


and  T  is  the  defined  mission  time. 
The  alternate  hypothesis 

H,  :  0,  >  0.07  at  t  =  T 


where 


0,  >  (1-R, )  =  (1-.93) 

and  T  is  the  defined  mission  time. 

Step  2.  Define  the  Accept/Reject  Decision 
Boundaries. 


RL(1)  =  e‘Ku<,)  =  e' 01 838  =  0.9818. 

Indicating  that  the  reliability  estimate  for  a 
one  hour  mission  is  0.9900  and  its  80%  lower 
bound  is  0.9818. 


The  accept  and  reject  lines  are  defined: 
Accept  Line  x„  =  -h0  +  ns 

Reject  Line  xn  =  +h,  +  ns 
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Reliability  Specification 

System 

Reliability 

Objective 

Ro 

Producer’s 

Risk 

a 

Minimum 

Acceptable 

Reliability 

R1 

Consumer’s 

Risk 

0 

Launcher 

.98 

.20 

.93 

.20 

Figure  8-10.  Launcher  Reliability  Objectives 


CUMULATIVE  FAILURES.  ■ 

Figure  8-1 1  Sequential  Test  Plan  (Non-Exponential)  R„  =  .98  R,  *  .93  a  *  0  =  20% 
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where 


n  =  sample  size 
s  =  slope 


then: 


ho  = 


Ln 

1-0 

_T_ 

Ln 

"Q,  (l-Qo)" 

_Q0  0-Q,)_ 

(8-12) 


Ln 


~l-.20~ 
_.20  _ 


h°  "  L  [~0.07(  1-0.02)  " 
"1 _ 0.02  (l-0.07)_ 


Ln  4.000 
Ln  3.688 


Therefore 
I 

also. 


h0  =  1 .062 


h.= 


“i-0i  r  i— -2o 

Ln  —  Ln  - 

_  a  J  L  -20  _ 


Ln 


h,  =  1.062 
and, 

Ln 


Q,  (i-Q0> 

r07(l-.02) 

j_Q0(l-Q,)_ 

Ln 

|_.02(l-.07)_ 

(8-13) 


s  -  (— 


i-Qo 
L1_Qi  J 


Ln 


1-.02 

1-.07 


Ln 


Q,  O-Q0) 


Ln  3.688 


(8-14) 


Qo  (i-Q, )_ 

s  =  0.040 

The  Decision  boundaries  are : 


Number  of 

Minimum  Sample 

Maximum  Sample 

Failures 

to  Accept 

to  Reject 

X 

"A 

nR 

0 

27 

l 

52 

— 

2 

77 

23 

3 

102 

48 

4 

127 

73 

5 

152 

98 

6 

177 

123 

7 

202 

148 

8 

227 

173 

9 

252 

198 

10 

277 

223 

Figure  8-12.  Acceptance  and  Rejection  Sample  Sizes 

This  test  could  be  terminated  (§  8.3. 1.2) 
at  seven  failures  and  175  samples.  (Truncation 
would  change  the  lower  portion  of  figure 
8-12.) 


8.4.2  Binomial  Interval  Estimation  Method 

If  a  device  is  to  be  used  only  once  (e.g..  a 
missile)  or  is  tested  and  used  in  missions  of 
uniform  length,  and  if  its  performance  in  any 
mission  may  be  characterized  unambiguously 
as  a  success  or  failure,  and  if  its  reliability  is 
stationary  over  successive  missions,  then  each 
mission  may  be  viewed  as  a  Bernoulli  trial 
and  the  device  reliability  as  a  binomial 
parameter. 

Terming  the  reliability  R  and  the  failure 
probability  0  =  1-R-  the  probability  of  ob¬ 
serving  x  failures  in  n  tests  is 


P,(x)  =  Qd-R),(R)n-*  (8-1 5) 


Accept  Line  x„  =  -1 .062  +  0.040  n 

Reject  Line  xn  =  +1 .062  +  0.040  n 

These  lines  can  be  converted  to  accept / 
reject  criteria  by  solving  for  n: 

Accept  when  n  >  26.55  +  25  xn 

Reject  when  n  <  -26.55  +  25  x 


where 

W  x!(n-x)! 

If  p  is  the  true  parameter,  the  probability 
of  observing  as  few  as  x  failures  in  n  trials  is 

2  Pf(i) 

nO 


These  two  lines  arc  plotted  in  figure  8-11 
and  further  illustrated  in  figure  8-12. 


Having  observed  x  failures  in  n  trials  we  may 
associate  1000%  confidence  with  a  statement 
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that  the  true  failure  probability  is  equal  to  or 
less  than  the  upper  limit  pu,  necessary  to 
satisfy  the  equality 

i  p,(i)  =  0 

t=o 

It  has  been  shown  [9)  that  an  upper  bound 
on  failure  probability  is  given  by 

1 


1  + - 

X+1  Fl-g:»,.f2 

where  F ,  -p ;r t .r2  is  the  100  (1-0)  percentage 
point  of  the  F  distribution  with  degrees  of 
freedom  f,  =  2(x+l)  and  f2  =  2(n-x).  The  reli¬ 
ability  is  then  bounded  by  RL  =  l-Qy  with 
100(1-0)#  confidence. 

A  lower  bound  on  failure  probability, 
usually  of  less  interest  in  demonstration  test¬ 
ing,  is  given  by 


Find :  It  and  RL 


R,  = 


■ft  *  18 

R=— =— =  0.90 
n  20 

1 


L  l+l(n-s+l  )/s]  F, 
f,  =  2(n-s+l)  =  6 


f2  =  2s  =  36 


_ 1 

Rl  "  I +[  ( 20—  1 8+ 1 )/ 1 8  ]  F 


From  figure  E-8; 
Therefore:  RL  = 


.20:6,36 

F  =  1.5188 

_ 1 _ 

l+(.  1667)  (1.5188) 


RL  =  0.7980 


Ql  1  +  l(n-x+l)/xlR 

1-0  I,.r2 

where  f,  =  2(n-x+I )  and  f2  *  2x.  A  100 
(l-a)7r  upper  bound  on  reliability  is  then 
Ru  =  1-Ql. 

Using  s  =  n-x  as  the  number  of  successes, 
the  (100  x  0)7c  lower  limit  or  bound  for 
reliability  is 


R,  = 


where 

and 


L  1  +  l(n-s+l)/s]F,.fl.fi  >fj 
f,  =  2(n-s+l),  f2  =  2s 


(8-16) 


R«  = 


l 


l+^ 


l 


S+1  Fi-o:f,.r2 
where  f,  =  2(s+l),  f2  =  2(n-s). 

Example:  Given, 

n  =  20  x  =  2  s  =  18  0=  .80 


8.5  DEMONSTRATION  IN  THE  PRESENCE 
OF  RELIABILITY  GROWTH 

A  complex  system  as  initially  assembled 
will  generally  contain  a  number  of  incipient 
failure  modes.  The  most  serious  modes  com¬ 
mand  first  attention  and,  as  they  are  cor¬ 
rected,  unmask  progressively  less  significant 
defects.  Reliability  grows  by  this  process, 
which  is  activated  and  continued  by  testing, 
observing  failures  and  correcting  their  causes. 
The  failure  analysis  and  corrective  action 
activities  required  by  government  develop¬ 
ment  contracts  are  a  closed-loop  process  de¬ 
signed  to  ensure  that  reliability  will,  in  fact, 
increase  with  increasing  program  time.  The¬ 
oretically,  once  a  failure  occurs,  testing 
stops,  failure  analysis  and  corrective  action 
is  performed  and  the  test  continues  after  cor¬ 
rective  action  implementation.  Monitoring  of 
reliability  growth  begins  early  in  the  develop¬ 
ment  test  program.  Data  to  be  used  in  the 
measurement  of  reliability  growth  will  be 
determined  from  a  review  of  the  integrated 
test  program  but  should  start  with  early  engi¬ 
neering  evaluation  testing. 

Reliability  theory  recognizes  early  growth 
but  the  hypothesis  testing  methods  discussed 
above  treat  only  mature  systems  of  constant 
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failure  rate  and  constant  failure  probability. 
Experience  has  shown,  however,  that  a  large 
and  complex  system  may  require  several  thou¬ 
sand  system-hours  of  testing  and  operation 
before  its  failure  rate  stabilizes  sufficiently 
for  a  static  model  to  represent  its  reliability 
accurately. 

Under  these  conditions,  the  measurement 
or  demonstration  problem  is  one  of  esti¬ 
mating  the  current  values  of  changing  reliabil¬ 
ity  parameters.  Analogously,  the  prediction 
problem  becomes  one  of  forecasting  future 
values,  particularly  ultimate  static  values. 

In  order  to  monitor  the  progress  of  reliabil¬ 
ity  growth,  a  model  of  the  growth  process  is 
needed.  The  model  proposed  by  J.T.  Duane 
12]  in  1964,  and  already  described  in  Section 
5,  was  based  on  empirical  analysis  of  aero¬ 
space  equipment  failure  data.  It  is  applicable 
to  measurement  at  the  same  assembly  level 
for  which  data  are  collected. 

As  noted  in  Section  5,  the  model  is 

Ar  =  KT-  (8-17) 

where  Aj-  is  cumulative  average  failure  rate, 

the  ratio  of  total  failures  x  to  total  operating 
time  T.  In  Duane's  model  the  instantaneous 
failure  rate  A  changes  at  the  same  rate  as  Ar 
and  is  displaced  from  Aj-  by  the  constant 
factor  (1-a). 

dx  d(Aj.T)  d(KT'-o)  (8-18) 

X  ~  dT  ~  dT  "  dT 

=  (l-a)KT-“  =  (1-a) 

Plotted  on  logarithmic  scales  the  curves  are 
linear  and  parallel  to  each  other  (figure  8-13). 
K  can  be  interpreted  as  the  initial  or  time-zero 
failure  rate.  It  is  a  function  of  initial  design 
quality,  system  complexity,  maturity  of  the 
system  relative  to  the  state-of-the-art,  quality 
of  shop  operations  and  other  variables.  Ex¬ 
perience  has  shown  that  it  is  seldom  less  than 
10  times  the  specified  target  value.  Lacking  a 
better  estimate,  a  safe  initial  procedure  is  to 
assume  K  at  10  times  the  predicted  failure 
rate  (or  10  percent  of  the  predicted  MTBF, 
since  a  growth  curve  can  be  plotted  using 
either  parameter).  The  curve  is  more  sensitive 
to  the  exponent  a  than  to  K.  The  exponent 
reflects  the  intensity  with  which  reliability 


improvement  is  obtained;  it  nearly  always 
lies  between  .2  and  .S,  the  average  being  close 
to  .3. 

The  form  of  presentation  shown  in  figure 
8-13  has  two  practical  advantages;  it  provides 
a  reasonable  visual  fit  of  test  data  to  the 
growth  model,  and  it  provides  a  best  estimate 
line  for  past,  present  and  future  observations 
with  minimum  computational  effort. 

Today,  the  Duane  model  is  embodied  in 
MIL-STD-1635  [3],  the  proposed  MIL-STD- 
781  D(  13],  and  other  military  standards. 


Figure  8-13.  Reliability  Growth  Model 
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Section  9 

RMA  DATA  SYSTEM 


NAVSEA  OD  2 1 549(  1]  requires  the  estab¬ 
lishment  and  maintenance  of  an  Integrated 
Data  System  (IDS)  whose  primary  purpose 
is  the  implementation  of  a  cost  effective  and 
comprehensive  data  program  to  support  all 
engineering  activities.  This  section  addresses 
a  subset  of  the  IDS,  the  data  which  is  primar¬ 
ily  used  for  RMA  evaluation  and  the  quality 
of  that  data. 

9,1  INTRODUCTION 

The  data  system  is  required  to  collect,  con¬ 
trol,  process,  distribute  and  store  essential 
information.  The  system  must  provide  for 
data  maintenance  and  be  capable  of  timely 
data  retrieval.  The  system  must  support  the 
needs  and  objectives  of  the  RMA  (hardware 
and  software)  evaluation  program. 

The  data  system  should  be  flexible  in 
output  formats  (Sorts)  to  aid  the  analysis 
effort  and  be  able  to  distinguish  between 
the  many  types  of  data  inputs  required  to 
satisfy  data  users’  needs.  Simple  analysis 
assistance  (computer)  programs  such  as. 
time  counting,  failure  counting,  repair 
counting,  threshold  alarms,  point  estimates, 
"white  space"  analysis,*  plotting,  etc.  can  be 
of  great  assistance  to  the  analysis  effort  and 


*  "White  Space"  Analysis  -  A  failure  data 
analysis  technique,  such  that  part,  equip¬ 
ment,  subassembly,  etc.  failures  are  sorted 
by  the  part  number,  for  example.  Rather 
than  listing  a  part  number,  such  as  Rl  (for 
resistor  Rl),  repetitively  for  each  failure 
of  Rl,  the  sort  program  or  manual  listing 
should  print  only  the  first  Rl ,  blanking  the 
remainder  off  the  listing.  The  created 
“white  space”  between  the  listing  of  Rl 
and  the  listing  of  the  first  R2  alerts  the 
analyst  to  the  magnitude  of  a  problem. 


can  be  implemented  without  difficulty  if 
data  entries  are  standardized. 

The  RMA  data  system,  as  an  element  of 
the  IDS,  must  have  explicit  definitions  of 
tasks,  responsibilities  and  required  coordi¬ 
nation.  Development  of  the  system  requires 
an  analysis  of  program  measurement  and 
assessment  objectives  to  establish  the  program 
data  needs.  Extensive  interface  between  the 
IDS  and  the  integrated  test  program  (ITP)  is 
required  to  assure  that  all  RMA  measurement 
and  assessment  objectives  are  fulfilled,  that  all 
data  needs  are  defined,  that  necessary  tests 
are  planned,  and  that  provisions  for  collec¬ 
tion,  control,  and  processing  data  into  useful 
output  reports  are  properly  coordinated. 
Data  control  procedures  should  be  prepared 
to  monitor  the  completeness,  conciseness, 
legibility,  accuracy,  and  validity  of  reported 
data,  and  to  format  and  input  the  data  for 
computer  and/or  manual  use.  Computer 
programs  for  data  processing  are  written  to 
enable  accurate  test  history  files  to  be  gener¬ 
ated  and  for  analysis  assistance.  Data  utili¬ 
zation  procedures  are  developed  to  make  use 
of  the  summaries  and  reports  compiled  from 
the  history  files.  Development  of  data  system 
functions  is  discussed  below  as  related  to  the 
hardware  and  software  portions  of  systems 
undergoing  development,  production,  and 
fleet  service. 

Once  established,  the  RMA  data  system 
should  be  applicable  to  any  hardware  or 
software  system  or  program  (upon  definition 
of  requirements  and  mission  information 
unique  to  a  program)  and  it  should  be  able  to 
handle  multiple  programs  concurrently.  The 
contractor  should  strive  to  make  operation 
of  the  data  system  straightforward,  self- 
regulating  and  largely  routine. 

Figure  9- 1  is  an  analysis  of  RMA  data 
needs. 

A  Flow  Chart  for  RMA  data  is  shown 
in  figure  9-2. 
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Figure  9-2.  RMA  Data  Flow  Chart 
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9.2  DATA  COLLECTION 

Detailed  procedures  and  instructions  must 
be  issued  to  all  organizational  units  and 
operations  personnel  who  participate  in  the 
generation  and  recording  of  basic  (raw) 
RMA  data.  These1  procedures  must  define 
the  data  to  be  reported,  the  forms  to  be  used, 
the  instructions  for  completing  the  forms, 
and  the  approval  process  necessary  to  assure 
data  quality.  Reporting  forms  should  include 
as  much  preprinted  information  as  possible 
with  emphasis  placed  on  ease  of  entering  data 
such  as  “check”  box  usage.  The  use  of  pre¬ 
printed  information  leads  to  the  reduction  of 
transcription  and  recording  errors,  saves  time, 
and  if  mechanized  data  processing  techniques 
are  used,  standardized  forms  reduce  errors 
and  programming  budgets. 

The  forms  should  encourage  participating 
personnel  to  record  concise  information  by 
providing  sufficient  choices  and  avoiding  the 
use  of  “other",  “none  of  the  above”,  etc. 
However,  while  encouraging  concise  infor¬ 
mation,  the  form  and  instructions  should  not 
require  information  which  cannot  be  provided 
without  guessing,  such  as.  asking  for  the 
“reason  for  failure”  (vice  symptom  of  failure) 
before  failure  analysis  is  performed. 

Some  aspects  of  and  sample  forms  for 
hardware  and  software  data  collection  are 
discussed  below. 

9.2.1  Hardware  Data  Collection 

Data  collection  by  means  of  test  and  oper¬ 
ation  logs,  test  result  forms,  failure  forms  or 
others  must  be  comprehensive  and  accurate. 
Development  of  logs  and  forms  and  the  in¬ 
structions  and  training  for  proper  data  record¬ 
ing  has  a  significant,  impact  on  the  data 
collection  effort  and  is  essential  to  accom¬ 
plishing  the  measurement  of  RMA.  Basic 
information  needed  for  hardware  RMA  evalu¬ 
ation  is  listed  in  figure  9-3.  Information 
categories  A.  B  and  (  are  needed  for  reli¬ 
ability  evaluation.  Categories  A  through  D 
are  needed  for  maintainability  evaluation. 
Categories  E  and  F  provide  necessary  analy¬ 
tical  and  reference  data,  respectively. 

9. 2. 1.1  Attribute  Data  Collection 

The  collection  of  attribute  (go/no-go) 
operational  and  test  data  can  provide  the 


most  error-free  recording  of  information 
since  personnel  interpretations  of  infor¬ 
mation  from  meters  and  other  measuring 
devices  are  minimized.  Figure  9-4  illustrates 
a  sample  form  for  collecting  attributes  data 
during  a  test  which  was  provided  to  the  test 
operator  with  the  preprinted  program  stan¬ 
dard  information.  Test  personnel  would 
complete  the  serial  number,  the  date  of  test, 
length  of  test,  test  result  (pass  or  fail),  failure 
report  number  (if  applicable),  calibration 
dates/status,  remarks  and  the  personnel 
information. 


9.2.1 .2  Variables  Data  Collection 

Collecting  variables  data  permits  the  ana¬ 
lyst  to  make  more  efficient  use  of  the  data. 
For  example,  data  taken  on  the  critical  per¬ 
formance  parameters  of  six  samples  of  device 
A  and  six  samples  of  device  B  are  shown  in 
figure  9-5. 

The  Specification  limits  for  the  critical 
performance  parameters  are  provided  in 
figure  9-6. 

In  both  cases  (A  and  B)  we  have  six  tests 
with  no  failure.  Therefore,  attribute  analysis 
yields  the  same  result.  Variables  analysis  indi¬ 
cates  that  design  B  is  superior  (and  also  raises 
the  estimate  for  design  A)  as  shown  in  figure 
9-7.  The  result  obtained  using  variables 
analysis  is  in  agreement  with  our  intuition 
which  tells  us  that  design  B  is  less  apt  to  ex¬ 
ceed  the  specification  limits.  The  attribute 
analysis  indicates  the  designs  are  equally  good 
contrary  to  our  intuition  and  highlighting 
the  value  of  making  efficient  use  of  the  data. 

Since  it  is  more  difficult  (probability  of 
collecting  erroneous  data  increases)  and 
costly  to  collect,  process  and  analyze  vari¬ 
ables  data  and  to  relate  probability  of  oper¬ 
ation  within  specification  of  a  number  of 
performance  parameters,  the  analyst  must 
trade-off  the  requirements  to  collect  variables 
data  versus  the  cost  of  obtaining  this  informa¬ 
tion.  The  evolution  of  automatic  test  equip¬ 
ment.  with  “result”  measurements  and 
printout  capability,  has  provided  the  ability 
to  collect  variables  data  more  accurately. 
Also,  the  use  of  preprinted  information, 
especially  minimum  and  maximum  limits,  is 
of  value  in  reducing  errors  in  collecting 
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A.  Test  Description 

1.  Test  Report  Number 

2.  Test  Level  -  Component,  Equipment  or  Subsystem 

3.  Test  Type  -  Qualification,  Acceptance,  etc. 

4.  Test  Site 

5.  Test  Environment 

6.  Date  of  Test 

7.  Test  State  -  Operating,  Non-Operating  or  Cycling 

8.  Test  Plan  Number 

9.  Test  Procedure  Number 

B.  Hardware  Identification 

1 .  Hardware  Name 

2.  Hardware  Drawing  Number 

3.  Hardware  Serial  Number 

4.  Hardware  Level 

5.  Sub-hardware  Actually  Involved  in  Test 

6.  Subcontractor 

7.  Project 

C.  Test  Results 

1.  Sample  Size 

2.  Operating  Time  or  Cycles 

3.  Operating  Mode 

4.  Test  Environment  (Temperature,  vibration,  etc.) 

5.  Failures  (Number  of) 

6.  Failure  Report  Numbers 

7.  Failure  Classification 

D.  Maintenance  Data 

1.  Corrective  Maintenance  Functions 

2.  Corrective  Maintenance  Time  Components 

3.  Delay  Time  Components 

4.  Modification  Time 

5.  Time-to-Failure 

E.  Relevant  Analytical  Data 

1.  Apportioned  Hardware  R&M  Requirements 

2.  Predicted  Hardware  R&M 

3.  Failure  Modes  Predicted  by  FMECA 

4.  Previously  Implemented  Corrective  Action 

5.  Mission  Profile  (Environment  and  Duty  Cycle  Profiles) 

6.  Math  Model 

7.  Generic  Failure  Rate  Data 

F.  Reference  Information 

1 .  Date  of  Entry'  (Data  Information  Reaches  Computer  File) 

2.  Test  Report  Number  (References  a  High  Level  Test  from  Which  Record 
Was  Generated) 

3.  Failure  Report  Number  (References  a  Higher  Level  Failure  Report) 

4.  Project  Code  (Identifies  Data  Used  from  Another  Contractual  Program) 


Figure  9-3.  Information  Useful  for  Reliability  and  Maintainability  (Availability)  Evaluation 
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PROGRAM 

Poseidon 

TEST  REPORT  NUMBER 

P42735 

NAME 

Recovery  Programmer 

DRAWING  NUMBER 

692D912P003 

SERIAL  NUMBER 

LEVEL 

Component 

TYPE 

FAT 

TEST  SITE 

Factory  Test  Lab 

TEST  INSTRUCTION  (Tl) 

SYS  3793 

TEST  LEVEL 

Component 

SECURITY  CLASSIFICATION 

Unclassified 

Tl  t 

DATE 

ENVIRONMENT 

STATE 

LENGTH 

PASS/ 

FAIL 

FAILURE  REPORT 

NO 

MINUTES 

CYCLES 

3.1.1 

Bench 

D 

— 

3.1.2 

Bench 

C 

— 

3.1.3 

High  Temp 

A 

— 

3.1.4 

High  Temp 

D 

— 

3.1.5 

High  Temp 

C 

3.1.6 

Bench 

D 

— 

3.1.7 

Bench 

C 

3.1.8 

Vibration 

A 

— 

3.1.9 

Bench 

D 

— 

3.1.10 

Bench 

C 

— 

EQUIPMENT  NAME 

— 

MODEL 

CALIBRATION 

!  DATE 

[  STATUS 

4273 

418 

11 

4823 

234 

4817 


REMARKS 

TESTER 

DATE 

QC  ENGINEER  APPROVAL/DATE 

DATE  TO  DATA  SYSTEM 

TEST 

EQUIPMENT 

USED 


Test  Set 
Brush  Recorder 
Vibrator 
Temperature 
Chamber 
Voltmeter 
Oscilloscope 


TS 

BR 

MB 

TC 

V 

TECT 


'State  A  - 
B  - 
C  - 
D  - 


Non-operating  but  must  survive  and  be  operational  in  a  later  mission  phase. 

Non-operating  but  must  not  operate  prematurely  and  must  be  operational  in  a  later  mission  phase. 
Operating.  Duration  countable  in  cycles  or  discrete  events. 

Operating.  Duration  measurable  in  units  of  time. 


Figure  9-4.  Performance  Data  Sheet  -  Attributes  Data 


DESIGN 

DATA  { 

#1 

#2 

#3 

#4 

#5 

#6 

A 

1.30 

3.55 

5.35 

7.55 

9.50 

11.75 

B 

5.95 

6.15 

6.40 

6.60 

6.90 

7.20 

Figure  9-5.  Data  Taken  on  Devices  A  and  B 
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DESIGN 

RELIABILITY 

ATTRIBUTES 

VARIABLES 

Point 

Estimate 

95%  Lower 
Conf  Limit 

Point 

Estimate 

95%.  Lower 
Conf  Limit 

A 

.8909 

.6070 

.9355 

.8899 

B 

.8909 

.6070 

.9999+ 

.9999+ 

LIMITS 

LSL 

0.0 

Nominal 

7.3 

USL 

14.6 

Figure  9-6.  Specification  Limits  Figure  9-7.  Comparison  of  Results 


variables  data.  Figure  9-8  is  an  example  of 
a  form  that  can  be  used  for  this  purpose. 

9.2. 1.3  Operating  Data  Collection 

Operating  data  (time  meter  readings, 
operating  mode,  operating  environment,  oper¬ 
ating  results,  etc.)  at  the  system  and  subtier 
level  can  be  reported  by  means  of  test  opera¬ 
tion  records  or  logs  such  as  figure  9-9.  The 
design  and  information  content  of  these 
logs  should  be  tailored  to  the  system  under 
evaluation  (Figures  9-9  and  9-10  are  exam¬ 
ples).  Typically,  line  entries  should  be  made 


to  fully  describe  each  equipment  test  state 
and  the  duration  of  the  state.  A  new  log 
entry  should  be  made  to  document  each 
change  in  conditions  of  the  test  such  as  fail¬ 
ure  occurrence,  state  of  the  equipment  under 
test  or  change  in  usage  mode.  Note,  the 
log  should  provide  for  any  requirement  to 
record  variables  data,  along  with  the  recorded 
or  measured  parameters  minimum  and  max¬ 
imum  values.  Responsibility  for  maintenance 
of  logs  should  rest  with  the  test  supervisor 
during  periods  of  testing,  training,  or  experi¬ 
mentation,  and  with  the  system  maintenance 
supervisor  at  other  times. 


PROGRAM 

POSEIDON 


TEST  REPORT  NUMBER 

PV42736 


TYPE 


FAT 


LEVEL 


Component 


NAME 


Recovery  Programmer 


DRAWING  NUMBER 

6<>2D9i2P003 


SERIAL  NUMBER 


SECURITY  CLASSIFICATION 
Unclassified 


TEST  INSTRUCTION  (Til 

SYS  37V 3 


TEST  LEVEL 

C  omponent 


DATE  TO  DATA  SYSTEM 


Tt 

Paragraph 

Descrtp'  s'* 
of  T«: 

Environment 

Unit  of 

Measure 

Class  of 
Character 

Specification 

Limits 

Actual* 

Readings 

Date 
of  Test 

2.2.1 

Visual  Inspection 

Bench 

P/F 

M 

Pm/Feil 

22  2 

Light  Indication 

Bench 

P/P 

M 

Peu/Feil 

224 

Beacon  Vo'  ‘.age 

Bench 

Volts 

M 

13.2  35.6 

22.5 

Timar  1  T-t 

Bench 

Seconds 

c 

O.fc  15 

2.34 

Beacon  Voyage 

High  Temp 

Volts 

M 

13.5-16.0 

235 

Timer  1  T—  e 

High  Temp 

Seconds 

C 

0.4  12 

J 

REMARKS 

• 

TESTER 

DATE 

OC  ENGINEER  APPROVAL 

DATE 

'Circle  ell  out  or  specification  readings 


Figure  9-8.  Performance  Data  Sheet  -  Variables  Data 
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9-9.  Test  Operations  Log 
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2 


Figure  9-10.  Typical  Log  Form 
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9.2. 1 .4  Failure  Data  Collection 

When  a  failure  occurs,  a  failure  report 
should  be  generated  with  the  following 
information  recorded:  Failure  symptoms  and 
circumstances,  identity  of  the  failed  hard¬ 
ware,  conditions  at  the  time  of  failure,  cause 
of  failure  (if  known)  and  the  disposition 
recommended  or  made  of  the  failed  hardware. 
Figures  9- 1 1  and  9-12  are  examples  of  typical 
failure  reports.  If  the  failure  occurs  during  a 
test,  the  test  document  number  should  be 
referenced  on  the  failure  report  and  vice 
versa.  Other  information  will  then  be  added 
to  the  report:  the  failure  classification  desig¬ 
nations  (catastrophic,  critical,  major,  minor), 
relevance  classification  (include  or  exclude 
the  failure  for  reliability  measurement), 
fault  isolation  data  (identification  of  failed 
elements),  and  the  environment  (the  test 
environment,  if  applicable)  in  which  the 
failure  oecured. 

Failure  reports  and  other  source  docu¬ 
ments  originating  in  test  or  service  areas 
should  be  routed  promptly  for  processing. 
Failure  reports  should  go  directly  to  failure 
investigation  for  review  and  classification; 
then  reproduced  and  distributed.  Test  data 
sheets  and  completed  log  sheets  should  go  to 
data  control  for  screening,  reproduction, 
recording  in  computer  files  and  distribution. 
Some  reports  cannot  be  totally  completed 
by  the  personnel  responsible  for  detecting 
a  failure.  Information,  such  as  “repaired/ 
replaced  item”  may  require  another  acti¬ 
vity’s  input.  Methods  for  handling  these 
situations  should  be  addressed  in  the  data 
system  procedures.  Techniques  such  as  the 
use  of  multilayer,  multicolor  chemically 
treated  forms  are  available  such  that  addi¬ 
tional  information  can  be  added  to  the  final 
copy  of  the  form  while  all  data  originating 
activities  can  keep  a  copy  of  their  own 
inputs. 

9.2.2  Software  Data  Collection 

Software  is  defined  as  computer  programs 
and  data  processed  in  a  computer.  It  is  a 
major  element  of  many  current  military 
systems  and  may  be  the  reliability-limiting 
element  in  a  system  because  of  its  complex¬ 
ity.  When  software  is  part  of  a  system  devel¬ 
opment,  the  contractor's  reliability  data 
system  should  permit  software  running  time 


and  software  error  data  to  be  collected, 
evaluated,  and  reported,  so  that  management 
can  measure  the  growth  of  software  reliability 
and  forecast  the  test  time  needed  to  reach  a 
satisfactory  level  of  software  reliability. 
Software  errors,  once  detected,  are  defined  as 
failures.  All  errors  detected  after  internal 
release  of  software  modules  should  be  re¬ 
ported,  analyzed  and  classified  not  only  to 
measure  reliability,  but  to  evaluate  the 
software  development  tools  and  techniques 
in  use  and  to  provide  information  for  plan¬ 
ning  and  improving  future  software  develop¬ 
ment  projects. 

Many  organizations  that  develop  large  scale 
military  software  employ  multiple  forms  to 
collect  software  data.  However,  a  single  form 
can  be  used  satisfactorily  to  collect  software 
reliability  data  if  the  fonn  includes  informa¬ 
tion  on  the  occurrence  of  errors  (i.e..  detec¬ 
tion  of  software  errors  and  omissions),  infor¬ 
mation  gained  by  subsequent  analysis  of  de¬ 
tected  errors  and  omissions,  documentation 
of  corrections  and  modifications,  and  verifica¬ 
tion  of  them,  together  with  certain  category 
data  needed  for  statistical  analysis  of  software 
development  progress. 

Figure  9-13  is  an  example  of  a  compre¬ 
hensive  software  trouble  report  (TR).  Data 
are  reported  on  the  form  as  indicated  below. 
The  form  is  initiated  by  test  or  operating 
personnel  who  detect  a  software  error  Fields 
marked  with  an  asterisk  are  filled  in  by  the 
analyst  or  review  team  who  investigate  the 
error. 

1.  Date  -  the  date  the  trouble  report  (TR) 
fo,  in  is  prepared. 

*2.  Error  Category  -  Three-character  error 
code  from  list  given  on  back  of  TR  fonn. 

*3.  Criticality  -  Circle  appropriate  severity 
code.  H  =  High.  M  =  Medium,  L  =  Low. 
NA  =  Not  Applicable. 

*4.  TR  Number  -  Test  report  number. 

5.  Title  -  A  brief  description  of  the 
problem. 

6.  Program  Designation  -  The  official 
designation  of  the  computer  program  against 
which  the  TR  is  written  (NA  for  documenta¬ 
tion  troubles) 

7.  Program  Document  -  The  official 
designation  of  the  program  document  against 
which  the  TR  is  written,  include  page,  para¬ 
graph  number,  etc  (NA  lor  program  and  logic 
troubles). 
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Software  Trouble  Report  (Reverie  Side) 

n  analysis  date  received 


ERROR  PREVIOUSLY  REPORTED  ON  TR  MS)  or  F/M  REPORT 

SIGNATURE 

DATE 

»  CORRECTIONS 

ECPNO. 

CODE  CHANGES 


DOCUMENTATION  CHANGES 


CORE  AND  TIMING  CHANGES 


CORRECTIONS  VERIFIED  BY 


ERROR  CATEGORIES 


A_0  COMPUTATIONAL  ERRORS 

A_  1  In  comet  operand  In  aquation 

A_  2  Incorrect  use  of  parenthesis 

A_  3  Sign  convention  error 

A_4  Units  or  data  conversion  error 

A_5  Computation  produces  an  over/under  flow 

A. 6  Inconect/inaccurate  equation  umd 

A_7  Precision  bn  due  to  mixed  mode 

A_8  Mfcamg  computation 

A_9  Rounding  or  truncation  error 

B_0  LOGIC  ERRORS 

B_l  Incorrect  operand  in  logical  expression 

B_2  Logic  activities  out  of  sequence 

■_  3  Wrong  variabfe  being  checked 

t_4  Misak*  logic  or  condition  tests 

■_5  Too  many/few  statements  in  bop 

1_S  Loop  Merited  incorrect  number  of  times 

(including  end  ten  loop) 
i_7  Duplicate  logic 

C_0  DATA  INPUT  ERRORS 

C_  I  Invalid  input  read  from  correct  data  file 
C_2  Input  read  from  incorrect  data  file 

C_  3  Inconecl  input  format 

C_4  biconect  format  statement  referenced 

C_5  End  of  file  encountered  prematurely 

C_6  End  of  file  missing 

D_0  DATA  HANDLING  ERRORS 

D.  0  Deu  file  not  rewound  before  reading 

D_  I  Data  kutializatbn  not  done 

D_ 2  Data  inittetiaation  done  improperly 

D_3  Variabte  used  asa  n>i  or  index  not  «t  properly 

D_4  Variabte  referred  to  by  the  wrong  name 

D_3  Bit  maniputetbn  done  incorrectly 

D_6  incorrect  varteblt  type 

D_7  Data  pecking/ unpacking  error 

D_8  Sort  error 

D_9  Subscript**  error 


DATE 


F_0  INTERFACE  ERRORS 

h  _  1  Wrong  subroutine  celled 

F_2  Cali  to  subroutine  not  made  or  made  in  wrong  place 

F_  3  Subroutine  arguments  not  consistent  in  type,  units,  order, 

etc. 

F_4  Subroutine  called  is  nonexistent 

E_5  Software/data  base  interface  error 

F_6  Software  user  interface  error 

F_7  Soft  ware/soft  ware  interface  error 

G_0  DATA  DEFINITION  ERRORS 

G_  1  Data  not  properly  define d/dimensbned 

G  _  2  Data  referenced  out  of  bounds 

G_  3  Data  being  referenced  at  incorrect  bcatbn 
G_  4  Data  pointers  not  incremented  properly 

H_0  DATA  BASE  ERRORS 


H  _  l  Data  not  initialized  in  data  base 

H_2  Data  initialized  to  incorrect  value 

H_  3  Data  units  are  incorrect 

l_0  OPERATION  ERRORS 


l_  I  Operating  system  error  (vendor  supplied) 

1.2  Hardware  error 

l_  3  Operator  error 

l_4  Test  execution  error 

l_5  User  misunderstanding/error 

l_6  Conftgurat bn  control  error 

J_0  OTHER 

J  _  I  Time  limit  exceeded 

J_2  Core  storage  limit  exceeded 

J  _  3  Output  line  limit  exceeded 

J  _  4  Compitetbn  error 

J_S  Code  or  design  inefficient /not  necessary 

J  _  6  User  /props  mmer  requested  enhancement 

J  _  7  Design  nonresponsive  to  requkements 

J  _8  Code  delivery  or  redelivery 

J  _9  Software  not  compatible  with  project  standards 

K_0  DOCUMENTATION  ERRORS 


E_l 
E_2 
E_  3 
E_4 
1.5 
E.i 
E_7 
E_l 


DATA  OUTPUT  ERRORS 

Data  written  on  wrong  fite 

Data  written  according  to  the  wrong  format  statement 
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Incomplete  or  mtadnt  output 

Output  DaM  toe  too  wnal 

Ua  count  or  papa  eject  probtem 

Output  ptMM  or  m trim  ding 


K_  |  Uaei  manual 

K  _  2  Interface  specification 

K.3  Design  specification 
K  .4  Requirements  specification 
K_3  Test  documentation 

XXO  PROBLEM  REPORT  REJECTION 

XXI  No  probtem 
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XX 4  Duplicates  another  probtem  report 

XXS  Deferred 


Figure  9-13.  Softwere  Trouble  Report  Form  (Continued) 
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8.  Unit/Site  -  Ship  or  test  site  at  which 
trouble  was  detected. 

9.  Reel  -  Serial  number  of  tape  reel  used 
(NA  for  documentation  and  logic  troubles). 

10.  Reference  Document  -  The  document 
which  provides  the  basis  for  determining  that 
trouble  exists. 

1 1 .  Responsible  Function  -  The  opera¬ 
tional  function  of  the  computer  program 
affected  by  the  trouble. 

*  1 2.  Responsible  Module  (s)  -  Designation 
of  module(s)  affected . 

1 3.  Test  #  Step  -  The  test  designation  and 
step  being  executed  at  the  time  the  trouble 
was  discovered  (NA  for  documentation  and 
logic  troubles).  Check  off  appropriate  box 
for  stage  of  testing  being  performed:  module 
verification  testing,  intermodule  compatibility 
testing,  system  validation  testing,  fleet  oper¬ 
ations. 

14.  Originator  -  Printed  name  of  the  indi¬ 
vidual  originating  the  TR. 

1 5.  Activity  Code  -  The  activity  and  code 
name  or  number  of  individual  originating 
the  TR. 

16.  Tel/Ext  -  The  office  phone  number 
and  extension  of  the  individual  originating 
the  TR. 

17.  Trouble  Description  -  Enter  date 

trouble  was  detected,  elapsed  time  meter 
reading  and  equipment  having  the  meter 
which  was  read.  In  absence  of  elapsed  time 
meters,  estimate  or  reconstruct  running 
time  from  system  logs.  Then  write  a  sen¬ 
tence  defining  the  trouble,  and  develop 
a  word  picture  of  events  leading  up  to  and 
coincident  wiih  the  problem.  Structure 
statements  so  that  a  programmer/test  ana¬ 
lyst  can  recreate  the  situation.  Cite  equip¬ 
ment  being  used,  unusual  cabling,  etc.  Indi¬ 
cate  console  on  line,  modes,  etc.  if  applicable. 
If  continuation  sheets  are  required,  fill  in 
page _ of _ at  top  of  TR  form. 

18.  Run  Time  -  Elapsed  time  from  pro¬ 
gram  start  until  trouble  occurred  in  hours/ 
quarter  hours  (NA  for  documentation  and 
logic  troubles). 

19.  Simulation  Used  -  Program/equip¬ 
ments  used  to  simulate  operational  condi¬ 
tions.  Indicate  tape  reel  number  if  applicable 
(NA  for  documentation  and  logic  troubles). 
Indicate  test  tools  used  if  applicable. 

20.  Linking  With  -  Write  in  other  link 
sites/programs  (NA  for  documentation  and 


logic  troubles).  Indicate  interfaces  with  test 
programs  and  tools. 

21.  Configuration/Transients  in  Core  - 
Identify  configuration/transients  loaded  when 
trouble  occurred  (like  patches)  (NA  for 
documentation  and  logic  troubles). 

22.  Problems  Duplicated  -  Check  duplica¬ 
tion  attempts/success/failures  for  problem 
troubles  (NA  for  documentation  and  logic 
troubles). 

23.  Dump  Data  -  Write  in  serial  number 
of  reel  containing  dumped  data;  enter  loca¬ 
tion  of  each  program  call  number  on  dump 
tape  (NA  for  documentation  and  logic  trou¬ 
bles). 

24.  Special  Data  -  Enter  data  from  special 
entrance  cells  that  will  aid  programmer  in 
isolation  of  trouble  (NA  for  documentation 
and  logic  troubles). 

25,26,27.  Stop  Data  -  Designate  physical 
computer,  function  designation,  and  type  of 
stop  and  failure  when  applicable.  Check 
appropriate  box  to  indicate  if  stop  was  under 
normal  control  or  abnormal;  if  abnormal, 
indicate  whether  infinite  loop,  program 
pointer  out  of  bounds,  or  other  crash  condi¬ 
tion.  Write  in  the  register  content  if  relevant. 
(NA  for  documentation  and  logic  troubles). 

*28.  Analysis  -  Indicate  date  TR  received 
for  analysis  and  write  a  brief  summary  of 
findings.  Indicate  original  TR  number  if  TR 
is  found  to  be  duplicate,  that  is,  if  the  same 
error  has  already  been  reported.  Indicate 
failure  report  or  failure/maintenance  report 
number  if  problem  first  appeared  on  a  hard¬ 
ware  reporting  form  or  is  associated.  Enter 
printed  name  of  person  conducting  analysis. 
Enter  data  of  analysis. 

*29.  Corrections  -  Give  brief  summaries  of 
code  changes,  documentation  changes,  core 
and  timing  changes  resulting  from  analysis. 
Enter  number  of  any  ECP  resulting.  Enter 
printed  name  of  person  verifying  correc¬ 
tions  and  date  of  verification.  As  a  minimum, 
identify  the  block  (level)  of  code.  Detailed 
explanation  of  code  changes  may  be  included 
in  the  appropriate  engineering  change  pro¬ 
posal. 

9. 2.2.1  Coding  Software  Error  Source  and 
Type 

The  ereor  code  list  given  on  the  reverse  of 
the  TR  form  was  recommended  by  research- 
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ers  who  studied  many  military  software  pro¬ 
grams.  It  is  reduced  from  a  much  longer 
initial  set  of  categories  and  is  adequate  for  the 
majority  of  programs. 

There  is  a  blank  character  in  the  3-charac¬ 
ter  alphanumeric  error  category  designator. 
This  blank  is  for  the  error  source.  Five  error 
source  designators  are  defined. 


Error 

Source  Code  Error  Source 

0  Requirements 


Coding 


Description 

Source  of  problem  is 
changing,  ill  con¬ 
ceived  or  poorly 
stated  performance 
requirement. 

Source  of  problem  is 
an  error  in  imple¬ 
menting  the  design 
as  code. 


Design 


Maintenance-  < 


Not  Known 


Source  of  problem  is 
in  preliminary  or 
detailed  design. 

Source  of  problem  is 
an  error  introduced 
in  process  of  trying 
to  fix  a  previous 
error. 

Source  of  error  not 
known. 


As  an  example  of  categorization  using  the 
3-character  designators,  A03  would  be  a  sign 
convention  computational  error  traceable  to 
an  origin  in  the  software  requirements. 

The  error  categories  when  completely 
recorded,  define  both  sources  and  types  of 
errors.  Even  if  analyses  of  software  errors 
prove  inadequate  to  support  assignment  of 
the  final  character,  a  considerable  amount  of 
useful  statistical  analysis  can  be  done  using 
only  the  first  two  characters.  A  category 

is  included  for  elective  enhancements  < J _ 6). 

making  the  categories  compatible  with  a 
single-form  system  in  which  not  every  TR 
reports  an  error. 

9.2.2.2  Coding  Software  Error  Severity 

A  general  set  of  software  error  criticality 
categories,  which  can  be  used  as  is  or  as  a 


basis  to  derive  project-specific  definitions 
for  use  in  block  3  of  the  TR  form,  are: 

HIGH  (H):  An  error  which  significantly 
degrades  user’s  mission  or  prevents  its  com¬ 
pletion. 

MEDIUM  (M):  An  error  for  which  a 
workaround  is  available,  so  that  mission 
performance  is  not  significantly  degraded. 

LOW  (L):  An  error  which  does  not  af¬ 
fect  performance. 

NOT  APPLICABLE:  A  program  enhance¬ 
ment  or  an  error  which  cannot  be  verified 
or  repeated,  or  is  a  secondary  error,  or  is  the 
result  of  a  documentation  error  or  dupli¬ 
cation. 

9.2. 2.3  Collecting  Software  Development 
Cycle  Data 

Block  13  of  the  TR  form  provides  space 
for  recording  the  stage  of  software  testing  in 
which  an  error  was  detected- module  verifi¬ 
cation  testing,  intermediate  compatibility 
testing,  system  validation  testing,  fleet  oper¬ 
ation.  This  information  is  helpful  for  anal- 


properly  begin  after  internal  release,  when 
the  debugged  module  is  first  placed  under 
control  of  a  formal  configuration  manage¬ 
ment  policy  (i.e.,  prior  to  module  integra¬ 
tion). 

9.2.2.4  Collecting  Software  Timing  Data 

The  TR  form  provides  block  18  for  run 
time  at  error  detection.  While  this  informa¬ 
tion  is  often  of  value  to  the  error  analyst, 
the  validity  of  run  time  or  CPU  time  sta¬ 
tistics  for  real  time  systems  is  questionable, 
since  typically  all  computers  are  running 
whenever  the  system  is  in  use.  Moreover, 
these  are  not  the  time  data  needed  for  reli¬ 
ability  measurement.  Cumulative  module 
use  time  from  date  of  internal  release  is 
needed.  This  information  must  usually 
be  synthesized  from  knowledge  of  internal 
release  date  and  cumulative  system  oper¬ 
ating  time  after  that  date.  Thus  the  time 
data  needed  at  occurrence  of  an  error  is  the 
cumulative  system  operating  time,  which  is 
available  from  elapsed  time  indicators  (ET1) 
in  many  computer  systems.  In  the  absence 
of  ETIs,  cumulative  operating  time  can  often 
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be  reconstructed  roughly  from  system  oper¬ 
ating  logs. 

The  TR  form  provides  space  adjacent  to 
block  17  to  accept  ET1  readings  from  a 
system  computer;  in  military  systems  having 
more  than  one  computer,  the  computer 
from  which  the  reading  is  taken  should  also 
be  completely  identified  when  the  reading 
is  recorded.  These  readings  should  be  capa¬ 
ble  of  rough  verification  (and  resolution  of 
disparities)  by  reference  to  the  system  oper¬ 
ating  log. 

9.2.2. 5  Collecting  Software  Error  Analysis 
Information 

A  brief  narrative  explanation  of  findings 
about  the  cause  of  the  error  should  appear 
on  the  TR  form.  A  summary  of  analysis 
results  is  also  needed  to  audit  the  accuracy 
of  the  error  category  assignment  in  block  2. 

9.2. 2.6  Software  Error  Correction  Data 

Corrections  fall  into  five  general  cate¬ 
gories:  1)  code  changes,  2)  documentation 

changes,  3)  design  changes,  4)  core  size 
changes  and  5)  timing  changes.  A  brief 
narrative  explanation  of  correction  changes, 
if  any,  should  appear  as  part  of  the  complete 
TR  after  investigation  is  complete  and  cor¬ 
rective  action  determined. 

9.2.3  Data  Collection  from  Fleet  Service 

During  a  development  program,  interest 
centers  about  the  inherent  reliability  and 
availability  of  the  system,  since  that  is  gener¬ 
ally  the  characteristic  for  which  the  supplier 
is  accountable.  The  measurement  process 
is  principally  concerned  with  accurate  assess¬ 
ment  of  failure  and  error  rates,  but  the 
majority  of  system  problems  that  engender 
concern  in  fleet  service  center  about  failure 
events,  particularly  repetitive  failure  events, 
rather  than  failure  rates.  It  is  a  consequence 
of  the  definitions  of  inherent  reliability 
and  availability  that  many  of  these  failures  are 
properly  excluded,  by  reason  of  their  causes, 
from  consideration  in  computing  develop¬ 
ment  type  inherent  reliability  and  availability. 
Their  presence  in  an  operational  system  re¬ 
quires  a  form  of  evaluation  that  I)  centers 
about  discrete  failure  events,  2)  considers  all 


hardware  failures  and  software  errors  regard¬ 
less  of  cause,  3)  identifies  significant  trends, 
patterns,  interactions,  4)  focuses  attention  on 
repetitive  failures,  S)  provides  measures  of 
relative  problem  importance,  6)  provides 
measures  of  maintenance  capability  or  prob¬ 
lem  resolution,  7)  evaluates  effectiveness  of 
corrective  actions,  and  8)  responds  quickly 
to  current  experience. 

Contractors’  data  systems  should  be  struc¬ 
tured  to  support  the  evaluation  of  the  reliabil¬ 
ity  and  availability  of  systems  in  fleet  use  to 
the  extent  provided  by  contract.  Such  a  data 
system  must  be  able  to  accept  inputs  from 
ship’s  maintenance  and  operating  logs,  and 
from  the  Fleet  Ballistic  Missile  Weapon  Sys¬ 
tem  and  Strategic  Weapon  System  Trouble 
and  Failure  Report  Program  (SSPINST  3100. 
1  F[  2) ).  Data  from  fleet  service  (e.g.,  figures 
9-14  and  9-15)  can  be  used  directly  in  the 
reliability  and  availability  measurement  mod¬ 
els  given  herein  and  can  be  analyzed  and  re¬ 
ported  by  the  methods  of  this  manual. 

The  data  collection  function  must  assure 
that  codes  developed  to  represent  data  ele¬ 
ments  (failure  modes,  “when  discovered”, 
initiating  activities,  etc.)  are  sufficiently 
comprehensive  to  represent  any  permissible 
information  item  that  may  be  reported  con¬ 
cerning  an  operational  system.  This  may  in¬ 
clude  information  never  evolved  in  the 
factory,  such  as  failure  modes  unique  to  the 
installed  environment,  interfaces,  utilization 
of  spares,  generation  of  scrap,  elapsed  time  in 
service,  manhours  used  in  maintenance 
or  operation,  and  compliance  status  with 
respect  to  engineering  modifications. 


9.2.4  Training  for  Data  Collection 

Personnel  who  test  and  repair  hardware 
may  be  reluctant  to  record  all  of  the  informa¬ 
tion  available  to  them  unless  they  understand 
the  need  for  the  data.  Therefore,  a  program  is 
necessary,  however  informal,  to  explain  the 
importance  of  test  data  to  the  program  and 
to  motivate  personnel  at  the  data  source. 
Test  personnel  must  be  helped  to  realize  that 
their  functions  are  important  in  fulfilling  con¬ 
tractual  requirements.  To  accomplish  this,  a 
program  should  be  instituted  which  trains  and 
motivates  test  personnel.  The  training  should 
include  a  review  of  all  data  collection  forms 
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Figure  9-14.  Trouble  and  Failure  Report  Form 
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Figure  9-15.  Elapsed  Time  Meter  Record 


and  a  discussion  of  the  information  to  be  re¬ 
corded  on  the  forms.  To  promote  the  record¬ 
ing  of  information  accurately,  completely  and 
legibly,  a  procedure  should  be  established  for 
immediate  review  of  completed  forms  for 
errors,  omissions,  or  poor  legibility,  and  to 
return  forms  promptly  to  the  originator  for 
correction  with  follow-up,  and  to  assure 
forms  are  corrected  and  forwarded  to  data 
control  within  permissible  time  frames. 

9.3  DATA  CONTROL 

Personnel  assigned  to  data  control  must 
have  access  to  test  areas,  to  assure  that  all 


testing  is  recorded  and  all  forms  are  for¬ 
warded  to  data  control.  Block-controlled 
serial  numbering  of  data  collection  forms  is 
one  technique  for  preventing  test  personnel 
from  omitting  any  data  (forms)  'rom  the 
data  system;  another  is  to  submit  to  the 
data  control  activity  a  checklist  containing 
the  drawing  numbers  and  serial  numbers  of 
items  scheduled  to  be  tested.  The  checklist 
is  compared  against  the  data  subsequently 
processed.  Disparities  are  referred  back  to 
data  originators  for  resolution. 

All  completed  forms  should  be  manually 
screened  for  gross  errors  in  the  data  control 
area  Screening  includes  verifying  legibility. 
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correct  system  identification,  correct  record¬ 
ing  of  data,  and  compliance  with  signature 
requirements.  Errors  in  these  categories  are 
returned  to  the  originator  for  correction. 

In  addition  to  the  manual  checks  made  on 
the  data,  a  computer  program  to  edit  the  data 
should  be  developed.  The  computer  should  be 
programmed  to  perform  accuracy,  validity, 
and  completeness  tests  on  the  data  and  to 
issue  an  error  list.  The  error  list  should  em¬ 
body  diagnostics  to  identify  the  type  of  error. 
Errors  should  be  returned  to  the  responsible 
operation  for  correction.  Experience  has 
shown  that  this  technique  provided  motiva¬ 
tion  to  personnel  involved  and  that  the  quan¬ 
tity  of  errors  rapidly  diminishes. 

Computer  tests  for  accuracy  should  assure 
numeric  data  does  not  appear  in  columns 
reserved  for  alphabetic  characters  and  vice 
versa.  It  should  also  check  ranges  where 
appropriate  (e.g.,  day  1  to  31,  month  1  to 
12,  year  00  to  99,  minutes  00  to  60,  level  1 
to  7,  etc.).  The  computer  completeness 
check  should  assure  that  all  required  data  is 
present  for  each  entry  (additional  data  is 
required  when  a  failure  occurs).  The  validity 
checks  are  used  to  monitor  the  test  results 
(time  or  cycles),  permissible  ranges  for  the 
time  for  states  A,  B  and  D  and  cycles  for 
state  C  may  be  inserted  into  the  computer. 
Data  reported  outside  these  limits  would  be 
returned  to  the  test  activity  to  be  verified, 
e.g.,  test  personnel  might  report  four  hours 
for  a  vibration  test  when  the  permissible 
range  for  the  test  is  15  to  30  minutes. 

While  an  error  list  is  capable  of  detecting 
errors  in  reported  data,  it  cannot  detect  the 
complete  omission  of  data.  The  purpose  of 
the  check  list  is  to  check  periodically  to  deter¬ 
mine  if  all  data  is  being  reported.  A  list  of  ail 
serialized  hardware  scheduled  for  test  should 
be  prepared  and  fed  into  the  computer.  The 
output  report  indicates  the  test  data  which 
have  not  been  received.  A  study  can  then  be 
addressed  to  determine  why  the  data  were 
omitted.  Corrective  measures  to  close  the 
loop  can  then  be  instituted. 

9.3.1  Hardware  Data  Control 

The  general  data  control  function  monitors 
reported  data  for  timeliness,  completeness, 
and  accuracy,  and  then  converts  the  data  for 
machine  processing  when  appropriate.  Data 


control  must  designate  pick-up  and  delivery 
points,  schedules,  distribution  and  account¬ 
ability  for  data  received  for  processing  within 
the  data  system.  Procedures  must  be  estab¬ 
lished  for  reproducing,  distributing,  collec¬ 
ting,  editing  and  filing  data  forms  emanating 
from  test  areas  and  fleet  service.  These  pro¬ 
cedures  should  include  the  manual  and 
automatic  methods  for  screening  the  reported 
data  for  compliance  with  requirements  and 
providing  for  corrections  of  errors  in  the 
reported  data. 

Working  with  the  data  processing  activity, 
data  control  should  determine  those  data 
fields  that  can  be  checked  by  computer  pro¬ 
grams,  the  applicable  checking  logic,  and 
information  apart  from  the  data  itself  needed 
to  check  for  errors.  Such  information  must 
be  furnished  by  data  control,  using  specific 
project  requirements  established  by  analysis. 

Information  needed  to  check  for  errors 
in  collected  data  consists  of  all  acceptable 
hardware  identifications  and  test  descriptions, 
alpha-numeric  fields,  identification  of  varia¬ 
bles  including  durations,  ranges  of  variables, 
and  criteria  for  determining  when  specific 
data  fields  should  appear  or  remain  blank  on 
the  originating  documents  (e.g.,  if  a  failure 
is  recorded  on  a  test  form,  it  should  be 
accompanied  by  reference  to  a  failure  report 
number).  For  externally  originated  data, 
complete  correction  of  errors  may  repre¬ 
sent  a  difficult  problem  for  a  contractor’s 
data  control  office.  Thus,  attention  should  be 
given  in  data  processing  to  rendering  the  sys¬ 
tem  outputs  as  insensitive  as  possible  to  data 
elements  having  high  error  rates,  and  to 
synthesizing  descriptive  statistics  that  are 
likely  to  be  approximately  correct  despite 
undetected  errors  in  the  data  base. 

Source  documents  should  be  marked  to 
indicate  that  essential  information  has  been 
extracted;  this  can  be  accomplished  by  stamp¬ 
ing  the  source  document  or  punching  a  hole 
at  a  designated  location.  Data  fields  or  blocks 
in  the  documents  that  are  not  properly  filled 
in,  or  that  contain  information  that  cannot  be 
interpreted,  can  be  circled  in  red.  Processed 
documents  should  then  be  copied  and  originals 
returned  to  the  originating  activity  for  error 
correction  as  required.  A  suspense  file  must 
be  provided  to  assure  that  all  detected  data 
errors  are  corrected  prior  to  final  processing 
and  that  no  data  are  lost  from  the  system 
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in  the  error  correction  loop.  The  file  should 
contain  a  record  of  all  errors  that  have  not 
been  corrected.  When  the  data  are  to  be  used 
as  inputs  to  a  computerized  processing 
function,  formatting  instructions  and  tabular 
information  for  automatic  error  editing  and 
validity  checking  must  be  established. 

Management  provides  a  key  role  in  the 
data  control  process  by  actively  reviewing 
the  data  system  (at  least  weekly)  and  em¬ 
phasizing  the  importance  of  the  data.  Manage¬ 
ment  should  carefully  monitor  the  suspense 
file  and  correct  personnel  deficiencies. 

9.3.2  Software  Data  Control 

9.3.2. 1  Review  of  Software  Trouble 
Reports  at  Close-Out 

The  process  of  documenting  software 
errors  is  itself  prone  to  numerous  errors,  as 
is  the  process  of  documenting  hardware 
failures  and  operating  time.  Thus  it  is  essen¬ 
tial  that  an  editorial  review  of  each  completed 
software  trouble  report  (TR)  be  included  as  a 
formal  step  in  the  process  of  closing  out  the 
report.  Particular  attention  should  be  given 
to  assuring  the  accuracy  and  completeness  of 
figure  9-13  blocks  2,  3,  12,  13,  17,  25  through 
27,  if  applicable,  28  and  29,  which  contain 
data  needed  for  reliability  assessment. 

9.3.2.2  Editorial  Review  of  Software  Error 
Category  and  Severity  Assignments 

The  three-character  error  category  code 
assigned  to  the  error  by  the  analyst  or  anal¬ 
ysis  team  must  be  reviewed  for  correctness  at 
close-out.  The  criticality  score  assigned  in 
block  3  of  the  TR  should  be  reviewed  for  ac¬ 
curacy.  While  reliability  indices  can  be  com¬ 
puted  relative  to  criticality  categories  indi¬ 
vidually  or  in  combination,  as  a  minimum, 
a  single  set  of  indices  should  be  computed 
using  high  and  medium  criticality  error  events 
taken  together  (or  a  judicious  combination 
of  the  highest  priority  levels  of  M1L-STD- 
1679[3)  if  the  software  errors  have  been 
categorized  in  accordance  with  that  standard). 
This  gives  meaningful  metrics  for  character¬ 
izing  the  reliability  status  and  growth  of  the 
software  relative  to  its  intended  mission 
requirements. 


9.3.2.3  Normalization  of  Software  Operating 
Time  Data 

Cumulative  operating  hours  following  re¬ 
lease  is  also  an  essential  data  element  for  soft¬ 
ware  reliability  calculations.  In  general,  it 
must  be  computed  because  not  all  of  the 
software  comprising  a  system  begins  testing  at 
the  same  time.  And  it  has  been  found  that  the 
concepts  of  mission  stress  and  duty  cycle 
which  apply  to  hardware,  have  analogs  in  the 
progressive  testing  environments  of  software 
development  programs.  Thus,  it  is  sometimes 
necessary  to  apply  multipliers  to  system  oper¬ 
ating  times  accrued  in  various  test  phases, 
multipliers  that  account  for  the  varying 
effectiveness  of  the  test  phases  in  detecting 
latent  software  errors. 

Another  problem  stems  from  the  frequent 
presence  of  multiple  elapsed-time  meters, 
which  in  a  real-time  system  should,  but  do 
not  always,  accumulate  time  at  the  same  rate. 
Often,  too,  a  meter  is  not  read  when  a  failure 
occurs,  due  to  oversight  or  because  test  per¬ 
sonnel  consider  their  reliability  data  genera¬ 
tion  function  ancillary  to  their  other  duties. 
Sometimes  times  are  not  continuous  because 
the  meters  are  an  integral  part  of  a  removable 
subassembly  (poor  design  practice).  Thus,  an 
important  part  of  the  editing  and  control 
function  is  to  compute  the  applicable  cumula¬ 
tive  operating  time  when  a  software  error  was 
first  detected.  Because  of  the  project-unique 
nature  of  this  computation,  and  because  it 
may  require  cross-references  to  system  logs,  it 
is  best  performed  manually  by  data  control 
personnel  editing  the  TR.  In  any  event,  an 
actual  or  estimated  cumulative  time  at  occur¬ 
rence  must  be  available  for  each  software 
error  entered  in  the  data  system. 

As  in  hardware  data  control,  management 
must  actively  evaluate  the  software  data 
system. 

9.4  DATA  PROCESSING 

The  data  processing  function  formats 
screened  data,  records  it  on  machine  readable 
media,  and  generates  and  maintains  a  data  his¬ 
tory  file  on  a  storage  media  for  hardware 
items  at  component  level  and  above,  and  for 
software  items  at  the  "module"  level  and 
above.  It  uses  the  accumulated  data  to 
generate  summary  reports  which  are  used  by 
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9.4.1  Format  Standardization 


the  reliability  analysis  activity  to  evaluate  and 
report  the  reliability  and  availability  of  the 
system  under  development  or  in  fleet  service. 

When  errors  are  detected  in  the  data  during 
processing,  listings  are  prepared  explicitly  de¬ 
fining  the  errors  and  are  sent  to  data  control. 
Data  control  must  then  secure  correction  of 
the  errors,  either  by  making  the  corrections 
directly  or  by  forwarding  the  source  docu¬ 
ments  to  their  originators.  This  action  must 
be  expeditious  so  that  the  originator  can  ac¬ 
curately  resurrect  the  reported  occurence.  A 
suspense  file  of  such  errors  is  maintained  in 
data  control  to  assure  that  no  data  are  lost 
from  the  system  in  the  error-correction  loop. 
Corrected  data  are  again  prepared  for  proces¬ 
sing  and  entered  into  the  next  processing 
cycle. 

It  is  essential  that  the  data  processing  func¬ 
tion  provide  the  capability  to  sort  on  all  fields 
of  data,  the  capability  to  delete  old  data  (sort 
by  date,  lot,  type  of  test,  etc.),  and  the  capa¬ 
bility  to  correct  errors  in  the  data  base. 

Tasks  involved  in  establishing  the  data  pro¬ 
cessing  function  are:  1)  determine  types  and 
quantities  of  information  to  be  processed, 
2)  determine  the  amount  of  auditing  (com¬ 
puterized  and  manual  error  checking)  to  be 
performed  on  data,  3)  perform  data  proces¬ 
sing  systems  analysis,  4)  issue  computer 
programming  specifications,  5)  prepare  and 
verify  computer  programs,  6)  issue  operating 
instructions  and  procedures.  In  accomplish¬ 
ing  these  tasks,  the  data  processing  activity 
should  work  with  groups  providing  inputs 
to  the  data  system  and  with  groups  that 
utilize  outputs  of  the  data  system. 

Generation  of  a  history  file,  formulated 
and  sequenced  for  use  in  preparing  RMA 
reports,  may  require  multiple  computer  runs, 
because  of  limitations  imposed  by  the  size 
of  the  computer  or  by  the  number  of  storage 
units  available.  Typically,  five  runs  arc 
needed:  1)  format  standardization.  2)  pre¬ 
liminary  processing.  3)  project  selection. 
4)  updating  and  compiling,  and  5)  a  final 
computer  run  to  provide  output  reports 
as  defined  by  the  data  utilization  function. 
Development  of  data  manipulations  to  be 
performed  in  each  of  the  computer  runs  is 
discussed  below.  Descriptions  of  typical 
output  reports  can  be  found  in  section  10. 


The  data  input  formats  must  be  rearranged 
into  a  standard  record  format  for  future 
processing.  This  permits  the  testing  areas 
to  use  the  forms  most  suitable  to  their  needs, 
providing  they  furnish  the  common  and  con¬ 
sistent  information  required  by  the  data 
collection  activity.  A  standard  fixed-length 
record  format  provides  ease  of  processing  and 
eliminates  the  need  to  correlate  information 
items  later  with  their  locations  on  the  record. 


In  this  step,  test  and  failure  data  are  re¬ 
viewed  to  verify  the  accuracy,  conciseness, 
completeness  and  validity  of  the  information 
processed.  Review  may  be  manual  or  by 
EDIT  routines  or  both.  Error  listings  should 
be  designed  to  be  easily  readable  by  personnel 
who  will  do  manual  error  corrections.  Be¬ 
cause  test  results  collected  for  a  particular 
system  under  test  may  also  be  applicable  to 
other  systems  employing  the  same  types  of 
hardware,  provisions  and  criteria  should  be 
included  in  the  data  system  for  transferring 
test  data  among  development  programs. 

In  some  instances,  reliability  or  availabil¬ 
ity  assessment  may  be  required  on  hardware 
at  subcomponent  levels,  but  these  subcom¬ 
ponents  may  never  be  tested  as  separate  de¬ 
vices,  only  as  portions  of  larger  devices. 
Therefore,  the  preliminary  processing  step 
may  need  to  provide  for  estimating  test 
results  from  component  level  down  to  sub¬ 
component  level.  To  accomplish  this,  the 
computer  must  be  furnished  information  on 
the  subcomponent  population  of  the  affected 
components  (configuration  information). 

For  example,  if  component  type  A  con¬ 
tains  one  each  of  three  types  of  subcompo¬ 
nents  ( A I ,  A2.  A3),  and  if  a  test  is  performed 
involving  two  of  the  subcomponents.  A I  and 
A3,  but  not  A2,  then  only  those  two  subcom¬ 
ponents  are  assigned  operating  time  or  cycles. 
The  third  subcomponent,  A2,  is  assigned  an 
amount  of  non-operating  time  equal  to  the 
operating  time  assigned  to  the  responsible 
subcomponent. 


9.4.2  Preliminary  Processing 
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9.4.3  Project  Selection 

In  this  step  the  processing  program  selects 
and  groups  data  b\  development  project, 
e  g.,  POSEIDON.  TRIDENT. 

9.4.4  Updating  and  Compiling 

Processed  test  data  records  will  be  used  to 
update  the  existing  test  history  file.  Regard¬ 
less  of  the  effort  expended  to  assure  accuracy 
of  test  data,  errors  are  to  be  expected  in  the 
history  files.  It  is  necessary,  therefore,  to 
provide  a  means  for  correcting  errors  in  the 
files  during  this  step. 

Since  a  history  file  is  updated  only  when 
reports  for  its  project  are  required,  provisions 
should  be  made  to  generate  several  optional 
reports  in  addition  to  the  composite  reliabil¬ 
ity  and  availability  status  report,  which  is  the 
basic  output.  Optional  reports  will  usually 
consist  of  summaries  of  test  results  across  or 
within  various  categories;  examples  are  sum¬ 
maries  within  each  serial  number  or  across  all 
test  environments. 

Data  processing  functions  also  include 
computation  of  summary  statistics  for  soft¬ 
ware  errors,  as  well  as  reliability  indices  for 
individual  modules  and  for  the  software 
system.  Summary  statistics  are  of  value  to 
management  in  evaluating  the  software  devel¬ 
opment  process  and  identifying  areas  in  need 
of  extra  attention  or  control.  Analyses  of 
value  include  frequency  plots  of  software 
error  categories,  error  sources,  and  error 
severities  for  each  software  module,  inter¬ 
module  error  rate,  and  percentage  of  abnor¬ 
mal  terminations. 

For  systems  in  fleet  service,  the  data  pro¬ 
cessing  function  should  provide  certain  data 
manipulations  and  outputs  beyond  those 
provided  for  systems  :n  development.  These 
additional  outputs  rr.3y  include  computed 
operational  reliability  and  availability,  listings 
of  repetitive  failures.  aluations  of  the  effec¬ 
tiveness  of  corrective  actions,  and  ordered  list¬ 
ings  of  high  failure  rate  components. 

9.4.S  Operational  Indices 

Operational  RMA  indices  are  computed  us¬ 
ing  models  selected  from  Sections  4  through 
7. 


9.4.6  Failure  History 

The  failure  summary  report  ( §  10.2.2)  pro¬ 
vides  one  line  of  information  for  each  failure 
that  occurs.  Special  consideration  should  be 
given  to  the  following  concerns. 

9.4.6. 1  Repetitive  Failures 

Contingency  tables  and  other  forms  of  sta¬ 
tistical  analysis  can  be  programmed  into  the 
processing  function  to  evaluate  the  signifi¬ 
cance  of  recurring  failures  and  failure  modes. 
The  predicted  partial  failure  rate  of  a  system 
with  respect  to  a  specific  mode  of  failure 
or  a  specific  location  as  predicted  during 
system  analysis,  is  an  a  priori  estimate  of  the 
relative  frequency  to  be  expected  of  the  mode 
or  of  the  part  in  that  location.  By  comparing 
the  expected  frequency  with  the  observed 
frequency  in  a  contingency  table,  the  signi¬ 
ficance  of  a  repetitive  mode  can  be  assessed 
using  the  chi-square  (X2)  statistic.  A  listing 
of  repetitive  component  or'  parts  'failures, 
especially  multiple  usage  components  or 
parts,  provides  the  analyst  with  a  necessary 
tool  to  detect  reliability  problems  not  neces¬ 
sarily  detected  by  utilizing  other  printouts. 

9.4.6.2  High  Failure-Rate  Components 

The  data  processing  activity  should  provide 
programming  to  highlight  components  that 
experience  failure  rates  significantly  higher 
than  predicted.  Provision  should  be  made 
to  rank  these  items  in  order  of  importance. 
System  performance  uses  a  combination  of 
problem  criticality  and  probability  of  occur¬ 
rence  ranking,  while  the  maintenance  burden 
is  based  on  cost,  spares,  maintenance  man¬ 
hours  involved,  etc.,  impacts. 

9.4.6.3  Effectiveness  of  Corrective  Actions 

Availability  of  a  continuous  time  and  fail¬ 
ure  history  enables  a  cumulative  failure  inci¬ 
dence  curve  to  be  plotted  for  a  system  in  ser¬ 
vice.  Such  a  curve  is  simply  a  graph  of  cumu¬ 
lative  failures  versus  cumulative  operating 
time  in  service- a  lion-decreasing  step  func¬ 
tion.  The  effective  date  of  specific  corrective 
actions  or  system  modifications  can  be  noted 
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on  the  curve,  permitting  ready  visual  compar¬ 
ison  of  performance  before  and  after  such 
actions.  The  height  of  the  risers,  or  the  aver¬ 
age  slope  in  various  regions  of  the  curve  indi¬ 
cates  performance  trends,  an  effective  correc- , 
tive  action  program  tending  to  be  reflected  in 
a  curve  that  is  concave  downward. 

9.5  DATA  UTILIZATION 

9.5.1  Hardware  Data  Utilization 

The  data  utilization  function  defines  and 
uses  the  outputs  of  the  data  processing  to 
provide  various  summary  reports  and  com¬ 
posite  reliability  and  maintainability  status 
reports.  Function  flow  for  each  type  of 
output  is  discussed  below. 

9.5.1 .1  Composite  RMA  Data  Output 

Data  processing  must  provide  outputs  (e.g., 
computer),  as  requested  by  the  user  activity, 
to  support  the  reliability,  maintainability  and 
availability  measurement  program.  The  out¬ 
puts  must  enable  solution  of  the  system 
statistical  models.  Reliability,  maintainability 
and  availability  indices  are  needed  for  each 
type  of  hardware  in  each  mission  environ¬ 
ment.  Processing  must  relate  test  data  from 
the  history  files  to  the  mission  file;  this  may 
require  use  of  the  “alpha”  conversion  factors 
described  in  §  4. 1.2.3  (multipliers,  in  units 
of  mission  per  unit  time,  used  to  normalize 
test  information  to  a  standard  mission  as  a 
common  time  base).  Confidence  limits  on  the 
calculated  indices  may  also  be  required. 
Apportioned  and  predicted  reliabilities,  main¬ 
tainabilities  and  availabilities  stored  in  the 
computer  are  recovered  for  comparison  with 
measured  indices. 

9.5. 1.2  Summary  Data  Outputs 

Data  processing  must  be  capable  of  re¬ 
trieving  data  and  providing  summaries  of 
data  in  the  history  files.  Figure  9-16  provides 
a  data  How  for  processing  data  from  the  his¬ 
tory  file  to  provide  summaries.  Summaries 
that  use  file  data  directly  should  be  generated 
during  file  maintenance  and  updating,  and 
made  available  at  users'  option.  At  times, 
special  purpose  data  summaries  may  be 


desired.  For  example,  a  compendium  of  fail¬ 
ure  rates  (§  10.2.7)  can  be  prepared  by  sum¬ 
marizing  failure  and  test  data  against  general 
environment  categories  for  specific  or  general 
types  of  hardware.  To  do  this,  the  data  base 
must  be  sufficiently  extensive,  and  certain 
additional  information  must  also  be  available, 
in  order  to  consolidate  environments  and 
hardware  items  into  valid  general  categories. 

Since  management  actions  by  the  contrac¬ 
tor  or  user  may  be  mtuor  objectives  of  rmrdt 
of  the  output  of  RMA  evaluation,  the  data 
utilization  function  should  emphasize  graphic 
presentations  and  other  techniques  to  high¬ 
light  problems  requiring  action.  In  particular, 
actions  requiring  long  lead-times  or  having 
broad  system  impact  (e.g.,  on  logistic  support 
or  manning  needs)  should  be  singled  out  and 
highlighted  to  data  system  users  at  the  earliest 
possible  moment. 


9.5.2  Software  Data  Utilization 

9.5.2. 1  Summary  Software  Error  Statistics 

Summary  statistics  should  be  computed 
monthly  or  quarterly,  as  warranted  by  the 
quantity  of  data  processed,  for  reporting  to 
management.  During  periods  of  intense 
software  development  activity,  weekly  reports 
may  be  desired.  In  addition  to  the  summaries 
discussed  below,  management  may  benefit 
from  error  breakdowns  by  software  function¬ 
al  area  and  other  categories. 


9. 5. 2. 2  Frequency  Analysis  of  Software 
Error  Categories 

Errors  experienced  and  reported  in  devel¬ 
opment  should  be  summarized  by  major 
category  (1st  character  in  block  2  of  figure 
9-13)  and  reported  in  tabular  form  as  percent¬ 
age  of  total  errors  or  in  histogram  form.  Both 
fonns  of  presentation  are  based  on  the  simple 
statistic  Xj/SXj  where  Xj  is  the  number  of 
errors  classified  in  the  ith  category  and  SXj 
is  total  error  events  reported  Figures  9-17 
and  ‘H8  provide  examples  of  both  forms  of 
summary  report.  (Statistics  presented  in 
figures  9-17  and  9-18  are  examples  used  to 
dlustrate  reporting  format.) 
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Figure  9-17.  Example  of  Percentage  Breakdown  of  Major  Error  Categories 
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Figure  9-18.  Example  of  a  Histogram  of  Major  Error  Categories 
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9.5.2.3  Frequency  Analysis  of  Software 
Error  Sources 

Similar  frequency  statistics  should  be  sum¬ 
marized  and  reported  by  error  source  code 
(2nd  character  in  block  2  of  figure  9-13). 
Figure  9-19  illustrates  a  typical  presentation 
format  for  a  large  software  system  comprised 
of  four  major  functional  areas-applications 
software,  simulator  software,  operating  sys¬ 
tem  and  test  tools. 

9.5.2.4  Intermodule  Error  Rate 

Intermodule  error  rate  is  computed  by 
counting  the  number  of  software  modules 
affected  by  each  error,  as  entered  in  block  12 
of  figure  9-13.  The  summary  statistic  re¬ 
ported  is  the  percentage  of  errors  involving 
n  modules,  where  n  =  1,2...  Figure  9-20 
shows  the  form  of  the  reported  statistics. 
As  shown  in  figure  9-20,  84.8  percent  of  the 
project  A  errors  affected  one  module,  10.4 
percent  affected  two  modules,  etc.  (The 
percentages  shown  are  an  example  computed 
from  actual  test  data  for  highly  modular 
military  software  developed  using  top  down 
programming  techniques.) 

9.5.2.5  Termination  and  Severity  Statistics 

The  percentage  of  normal  versus  abnormal 
terminations  should  be  tabulated  and  re¬ 
ported,  using  data  from  blocks  25,  26,  27  of 
the  TR  form  (figure  9-13).  An  example  is 
given  in  figure  9-21  which  shows  90.37e  nor¬ 
mal  and  9.17c  abnormal  terminations.  The 
level  of  severity  is  shown  in  figure  9-22  which 


shows  8%  high,  31%  medium,  41%  low  and 
20%  not  applicable. 

9.5.2.6  Software  Reliability  Statistics 

Figure  9-23  illustrates  a  report  of  software 
reliability  growth  status.  Tabular  reports 
may  be  supplemented  by  graphic  reporting 
formats  depicting  reliability  growth  history. 

9.S.3  Failure  Data  Utilization 

Both  hardware  and  software  failure  data  (a 
software  error  when  detected  is  a  software 
failure)  are  essential  inputs  to  the  corrective 
action  system.  Each  failure  must  be  reported 
and  investigated.  The  results  of  the  investiga¬ 
tion  determine  the  need  for  analysis  and  cor¬ 
rective  action. 

Reports  such  as  those  described  in  §  10.2 
are  useful  in  ensuring  that  failures  are  proper¬ 
ly  reported  and  followed  (investigation,  anal¬ 
ysis,  corrective  action)  until  properly  closed 
out. 
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Figure  9-19.  Example  of  Source  Frequency  Breakdown 
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Section  10 

RMA  DOCUMENTATION 


The  documentation  described  in  this  sec¬ 
tion  will  normally  be  generated  by  the  prime 
contractor  to  fulfill  RMA  requirements  stipu¬ 
lated  in  NAVSEA  OD  21549(1].  This  docu¬ 
mentation  provides  for  the  planning,  record¬ 
ing,  compiling,  analyzing  and  reporting  of 
pertinent  RMA  data  and  information.  The  de¬ 
velopment  of  this  documentation  should  be 
accomplished  in  a  systematic  and  timely  man¬ 
ner.  This  means  that  RMA  documentation 
needed  for  system  design  decisions  and  pro¬ 
gram  management  decisions  is  available  in 
time  to  facilitate  and  support  those  decisions, 
and  that  documentation  which  reports  RMA 
evaluation  program  results  is  available  at 
scheduled  time  points  to  effectively  monitor 
program  progress. 

The  documentation  described  in  this  sec¬ 
tion  is  divided  into  two  categories;  those 
prepared  by  the  contractor  for  submittal  to 
SSPO  for  approval  or  information,  and  those 
prepared  by  the  contractor  for  “in-house” 
use,  including  those  containing  information 
needed  to  prepare  reports  for  SSPO. 

10.1  DOCUMENTS  FOR  SUBMITTAL 

TO  SSPO 

RMA  evaluation  program  documentation 
and  subsequent  changes  thereto  normally 
submitted  to  SSPO  for  information  or  ap¬ 
proval  include: 

•  Plans: 

•  Reliability  Evaluation  Plan 

•  Maintainability  Program  Plan 

•  Reliability  Demonstration  Test  Plan 

•  Maintainability  Demonstration  Test 
Plan 

•  Test  Procedures: 

•  Reliability  Demonstration  Test  Pro¬ 
cedure 

•  Maintainability  Demonstration  Test 
Procedure 


•  Reports: 

•  Reliability  Prediction  Reports  (Initial, 
Intermediate  and  Final) 

•  Reliability  Status  Reports 

•  Maintainability  Status  Reports 

Since  Appendix  B  of  NAVSEA  OD  21549[  1] 
and  the  Contract  Data  Requirements  List 
(CDRL)  DD  Form  1423  are  used  to  tailor 
NAVSEA  OD  21549  requirements  to  each 
specific  procurement,  contractual  require¬ 
ments  for  a  given  project  may  not  require 
ail  of  the  above  documents  be  submitted  to 
SSPO.  Nevertheless,  the  general  contents  of 
each  of  the  above  documents  are  described 
herein  to  cover  the  most  stringent  set  of  re¬ 
quirements. 

10.1.1  Plans 

Availability  is  inherently  a  function  of  a 
system’s  reliability  characteristics,  and  if  the 
system  is  repairable,  it  is  also  a  function  of 
the  system’s  maintainability  characteristics. 
Since  SSPO  establishes  stringent  reliability 
and  availability  objectives,  contractor  ap¬ 
proaches  to  the  achievement  of  these  objec¬ 
tives  are  of  vital  interest  and  concern  to 
SSPO.  Accordingly,  SSPO  requires  the  con¬ 
tractor  to  document  his  plans  for  meeting 
specified  RMA  system  objectives.  The  con¬ 
tent  of  these  plans  are  described  in  §  10.1.1.1 
and  10.1.1.2  below.  The  content  of  con¬ 
tractor  plans  for  verifying  achievement  of 
RMA  objectives  are  described  in  §  10.1.1.3 
and  10.1.1.4. 


1 0. 1 . 1 . 1  Reliability  Evaluation  Plan 

The  Reliability  Evaluation  Plan  should 
delineate  those  approaches  and  methods 
which  the  contractor  proposes  to  use  to  ful¬ 
fill  contractual  reliability  evaluation  program 
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requirements.  These  requirements  will  nor¬ 
mally  include:  ( 1 )  the  preparation  of  reli¬ 
ability  block  diagrams;  (2)  apportionment  of 
system  reliability  objectives  and  minimum 
acceptable  reliability  for  mission  phases  to 
system  elements;  (31  during  the  development 
phase,  periodic  predictions  of  the  reliability 
expected  in  the  system  design  eventually  re¬ 
leased  for  full  scale  production;  (4)  the  iden¬ 
tification  of  failure  modes  that  could  abort 
the  system’s  mission  using  the  technique  of 
failure  mode,  effects  and  criticality  analysis 
(FMECA);  (5)  provisions  for  the  collection  of 
essential  reliability  data;  (6)  assessment  of 
system  reliability  during  the  development 
phase  and  continuing  through  subsequent  life 
cycle  phases;  (7)  the  provision  of  evidence 
by  demonstration  testing  that  the  system 
meets  specified  reliability  objectives,  and  last¬ 
ly;  (8)  provisions  for  the  periodic  reporting 
of  system  reliability  status,  problems,  and 
trends.  The  plan  should  include  a  descrip¬ 
tion  of  the  system  and  equipment  operation 
related  to  the  mission  phases  for  which  reli¬ 
ability  is  to  be  evaluated,  and  should  relate 
reliability  evaluation  tasks  to  system  program 
milestones. 

Primary  information  needed  to  develop  the 
reliability  evaluation  plan,  such  as  the  results 
of  Mission  Analysis  and  System  Analysis  (e.g. 
environmental  and  duty  cycle  profiles  for  sys¬ 
tem  elements,  configuration  baselines,  etc.) 
must  be  developed,  documented  and  provided 
to  personnel  charged  with  the  development  of 
the  plan.  This  and  other  essential  supporting 
information  are  identified  in  §  10.2. 

All  information  needed  to  complete  the 
plan  is  not  normally  available  early  in  the  pro¬ 
gram,  however  submission  of  the  plan  should 
not  be  delayed  because  it  is  incomplete  in 
some  of  its  details.  Instead,  tentative  or  pre¬ 
liminary  information,  with  a  schedule  for  pro¬ 
viding  mission  details,  should  be  submitted 
for  review.  The  plan  should  be  completed  as 
early  in  the  development  phase  as  possible. 
Minor  revisions  and  task  schedule  updates 
should  normally  be  done  on  an  annual  basis. 

10.1.1.2  Maintainability  Program  Plan 

The  Maintainability  Program  Plan  should 
delineate  the  approach  and  methods  the  con¬ 
tractor  will  use  to  fulfill  contractual  main¬ 
tainability  program  requirements.  The  main¬ 


tainability  program  should  establish  realistic 
maintainability  design  criteria,  provide  peri¬ 
odic  assessments  of  achieved  maintainability, 
identify  significant  maintainability  problem 
areas,  and  provide  evidence  of  compliance 
to  contractual  maintainability  program  re¬ 
quirements.  The  plan  should  relate  maintain¬ 
ability  program  activities  to  system  program 
milestones.  Specific  tasks  to  be  discussed  in 
the  plan  include:  (1)  apportionment  of  sys¬ 
tem  maintainability  objectives  to  system  ele¬ 
ments  to  be  maintained  during  the  system’s 
operational  readiness  phase;  (2)  providing 
periodic  predictions  of  the  system’s  main¬ 
tainability  characteristics  under  tactical  condi¬ 
tions;  (3)  the  examination  and  evaluation  of 
proposed  and  actual  designs,  including  soft¬ 
ware,  in  order  to  establish  the  most  effective 
and  efficient  design  for  preventive  and  correc¬ 
tive  maintenance;  (4)  the  provision  of  a  sys¬ 
tem  for  the  collection  of  essential  maintain¬ 
ability  data,  Le.,  data  necessary  to  support 
system  maintainability  design,  and  main¬ 
tainability  analysis  and  assessment  functions; 
(S)  assessment  of  maintainability;  (6)  dem¬ 
onstration  of  the  achievement  of  maintain¬ 
ability  objectives;  and  (7)  the  periodic  report¬ 
ing  of  maintainability  program  status,  prob¬ 
lems,  and  trends.  Additional  details  on  the 
nature  of  the  above  tasks  are  provided  in 
NAVSEA  OD  21549[1]  paragraphs  3.3.2, 
3. 3.2.1, .  .  .  ,  3.3. 2. 7. 

10.1.1.3  Reliability  Demonstration  Test  Plan 

The  contractor  should  provide  evidence 
that  the  system  meets  specified  reliability 
objectives.  When  stipulated  in  the  contract, 
the  contractor  should  prepare  a  plan  for  the 
formal  demonstration  of  the  reliability  of  the 
system.  System  reliability  should  be  demon¬ 
strated  at  the  highest  assembly  level  practica¬ 
ble  using  units  that  represent  the  production 
configuration.  Test  units  should  be  subjected 
to  environments  and  operational  demands 
similar  to  those  anticipated  in  tactical  use. 
The  test  plan  should  include,  as  a  minimum: 

a.  Identification  of  hardware  and  quantity 
to  be  tested. 

b.  Identification  of  software  to  l*e  tested. 

c.  Test  objectives  and  type  of  test  plan 
selected. 

d.  Test  plan  criteria  for  each  assembly  of 
hardware/software  undergoing  demonstration 
testing: 
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•  Values  for  R* ,  R^ , a  and  p. 

•  Demonstration  pass/fail  criteria. 

•  Character  of  underlying  failure  dis¬ 
tribution:  exponential,  binominal,  normal, 
etc. 

e.  Test  requirements  including  parameters 
to  be  measured,  environments  to  be  simu¬ 
lated,  test  time,  facilities,  test  and  measuring 
equipment  and  related  software. 

f.  Requirements  for  data  collection,  anal¬ 
ysis  and  reporting. 

g.  Criteria  for  continuing  test  in  the  event 
a  failure  occurs. 

1 0. 1 . 1 .4  Maintainability  Demonstration  Test 
Plan 

The  contractor  should  demonstrate  achieve¬ 
ment  of  system  maintainability  objectives 
when  required  by  the  contract.  This  dem¬ 
onstration  should  be  conducted  using  tools, 
diagnostic  and  support  equipment,  docu¬ 
mentation  and  software  as  will  be  used  during 
shipboard  maintenance.  The  approach  and  the 
details  of  demonstration,  including  the  selec¬ 
tion  of  demonstration  personnel  should  be 
described  in  a  Maintainability  Demonstration 
Test  Plan.  The  plan  should  also  include  re¬ 
quirements  for  data  collection,  analysis  and 
reporting,  and  demonstration  pass/fail  cri¬ 
teria. 

1 0. 1 .2  Reliability  and  Maintainability 
Demonstration  Test  Procedures 

The  contractor  should  prepare  detailed  test 
procedures  to  assure  full  and  controlled 
implementation  of  the  demonstration  plans 
for  system  reliability  and  maintainability 
described  above.  These  procedures  should 
provide  clear  and  specific  instructions  on  the 
completion  of  each  step  in  the  testing  process. 
They  should  be  tailored  to  the  special  features 
and  elements  of  each  of  the  demonstration 
test  plans.  The  following  information  should 
be  included  in  the  test  procedures,  as  appro¬ 
priate: 

a.  Characteristics  to  be  measured,  includ¬ 
ing  tolerances. 

b.  Input  and  load  values,  including  tol¬ 
erances. 

c.  Identification  of  test  and  measuring 
equipment,  recording  equipment  and  sup¬ 
porting  software. 


d.  Identification  of  special  equipment  or 
facilities. 

e.  Method  to  be  used  in  test  performance, 
including  sequential  steps.  Military  standard 
test  methods  shall  be  used  when  applicable. 

f.  Verifications  to  be  made  before  con¬ 
duct  of  test. 

g.  Instructions  for  data  recording. 

h.  Actions  to  be  taken  in  the  event  of  test 
interruptions. 

i.  Pass  or  fail  criteria. 

j.  Applicable  safety  precautions  for  per¬ 
sonnel  and  facility  protection. 

k.  Diagram  or  detailed  description  of  the 
test  set-up  such  as  interconnection  infor¬ 
mation,  relative  equipment  placement,  mount¬ 
ing  of  sensors,  and  grounding  points. 

l.  Identification  of  calibration  and  pre¬ 
ventive  maintenance  requirements  for  items 
under  test  or  test  facility  equipment. 

m.  Descriptions  of  test  conditions;  en¬ 
vironments,  duty  cycles,  work  space  con¬ 
straints,  etc. 

10.1.3  Reports 

10.1.3.1  Prediction  Reports 

Starting  early  in  the  design  phase.  Reli¬ 
ability  and  Maintainability*  Predictions 
should  be  utilized  in  the  design  process. 
These  predictions  should  be  updated  as  the 
design  progresses  and  continue  through  the 
completion  of  design.  Prediction  techniques 
should  also  be  utilized  to  evaluate  the  impact 
on  reliability  and  maintainability  of  proposed 
corrective  actions  to  correct  deficiencies. 
Final  predictions  should  be  updated  to  re¬ 
flect  design  changes  resulting  from  corrective 
action  and  R  and  M  improvement  activities. 

Formal  prediction  reports  should  reflect 
the  design  and  knowledge  available  at  each 
report  date.  However,  the  working  file  should 
be  kept  current  (updated,  for  example, 
monthly).  The  report  should  provide  suf¬ 
ficient  information  (worksheets,  data  source, 
etc.)  to  permit  results  to  be  verified  and  to 


'Maintainability  Predictions  are  covered  in  this  para¬ 
graph.  however.  Maintainability  Prediction  Reports 
arc  not  generally  submitted  to  SSPO  for  approval  or 
information.  Maintainability  analysis  material  is  re¬ 
viewed  during  program  evaluations  at  contractor 
facilities. 
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provide  traceability.  Figures  10-1  through 
10-12  are  examples  of  the  content  required 
for  various  prediction  worksheets.  Each  pre¬ 
diction  report  should  also  include  a  summary 
of  the  prediction  results,  to  include  as  a  mini¬ 
mum,  a  discussion  of  parts,  subassemblies, 
etc.  not  meeting  the  design  stress  guidelines, 
repair  actions  not  being  within  the  time  con¬ 
straint  allocations,  problem  areas  (such  as 
overly  complex  circuitry),  comparison  of  re¬ 
sults  with  requirements,  any  corrective 
actions  required,  proposed  corrective  action, 
and  reason  for  significant  changes  in  pre¬ 
dicted  values.  When  Bayesian  evaluation 
is  contemplated  and  the  predictions  are  to  be 
(or  are  being)  used  as  prior  information,  the 
prediction  report  should  discuss  the  assump¬ 
tions  made  and  the  procedure  used  for  estab¬ 
lishing  the  strength  of  the  prior  and  its  sen¬ 
sitivity  of  contravening  data. 

R  and  M  predictions  should  be  an  integral 
part  of  the  design  review  file.  The  contractor 
may  incorporate  reliability  prediction  updates 
in  the  Reliability  Status  Reports  (RSRs), 
if  advantageous  to  do  so  and  if  not  in  con¬ 
flict  with  required  data  item  submittals. 
Maintainability  predictions  should  be  a  part 
of  the  Maintainability  Status  Reports  (MSRs). 

1 0. 1 .3.2  Status  Reports 

Reliability  and  maintainability  status  re¬ 
ports  are  submitted  periodically,  usually 
quarterly  to  SSPO  throughout  the  life  cycle 
of  the  product.  The  reports  should  present 
an  assessment  of  the  system  reliability  and 
maintainability.  An  availability  assessment,  as 
necessary  or  as  required,  should  be  provided 
in  the  reliability  status  report  (RSR).  The 
contractor  should  utilize  qualitative  and  quan¬ 
titative  techniques  in  the  analysis  of  the 
present  status  of  the  program.  Each  report 
should  indicate  the  current  status  compared 
to  requirements  and  projected  growth  thereby 
isolating  problem  areas.  The  reports  should 
address  the  problem  areas  and  the  corrective 
actions,  both  planned  and  accomplished. 

Each  report  should  contain,  as  appropriate: 

a.  A  management  summary,  perhaps  the 
most  important  portion  of  the  report,  con¬ 
taining  program  status,  problems,  corrective 
actions  and  a  discussion  of  significant  trends, 
events  and  achievements.  The  discussion  of 
problem  areas  should  include,  as  a  minimum, 
all  items  for  which  the  estimated  reliability 


or  maintainability  are  below  the  required  or 
apportioned  values,  the  cause(s)  of  the 
shortfall,  the  proposed  corrective  action,  the 
date  or  serial  number  effectivity  of  the  cor¬ 
rective  action,  and  the  anticipated  effective¬ 
ness  of  the  corrective  action.  Subsequent  re¬ 
ports  should  monitor  the  effectiveness  of  the 
corrective  actions  taken. 

b.  A  discussion  of  potential  problems  un¬ 
covered  by  analysis  or  test,  that  although 
not  totally  verified  by  test  or  operation,  may 
degrade  product  reliability  and  maintain¬ 
ability  if  not  resolved. 

c.  The  status  of  the  evaluation  tasks  versus 
the  schedule  in  the  approved  Reliability  Eval¬ 
uation  Plan  or  Maintainability  Program  Plan 
with  a  discussion  of  each  slippage,  its  impact 
on  program  schedule  and  its  impact  on  sys¬ 
tem  reliability  and  maintainability. 

d.  A  discussion  of  the  data  inputs  to  the 
R  and  M  models,  including  the  number  of 
successes,  the  number  of  failures,  the  oper¬ 
ating  time,  repair  times,  downtimes,  and  the 
data  sources,  as  applicable.  A  discussion  of 
the  prior  distribution  should  be  included  in 
the  RSR  when  Bayesian  methods  are  em¬ 
ployed.  (The  basic  description  would  appear 
in  the  reliability  evaluation  plan  or  main¬ 
tainability  program  plan,  however,  a  discus¬ 
sion  of  the  similarity  of  the  test  data  and  the 
prior  may  be  appropriate  in  this  report.)  In 
early  reports,  the  data  used  for  prediction  is 
stressed,  whereas  in  later  reports  the  measure¬ 
ment  and  demonstration  data  are  emphasized. 
This  transition  should  be  acknowledged  and 
discussed. 

e.  A  table  of  the  current  apportioned,  pre¬ 
dicted,  and  measured  (best  estimate  and  80% 
lower  bound)  reliability  indices  for  the  sub¬ 
system,  equipment,  components,  and  soft¬ 
ware  for  each  mission  phase  (operational  read¬ 
iness,  launch,  and  flight),  as  appropriate  (e.g., 
figure  10-13),  and  an  explanation  of  any  sig¬ 
nificant  changes  since  the  last  report. 

f.  Growth  curves  showing  measured  reli¬ 
ability  versus  program  time  for  the  subsystem 
and  new  equipments.  These  curves  should 
show  the  objective,  the  previous  four  status 
points,  the  present  point,  and  should  project 
at  least  one  year  into  the  future  the  expected 
reliability  status. 

g.  A  brief  description  of  any  changes  to 
the  system,  subsystem  and  equipment  opera¬ 
tion,  and  of  the  mission  for  which  reliability 
and  maintainability  is  being  reported. 

10-4 


Figure  10-1.  Sample  Initial  Reliability  Prediction 


Figure  10-2.  Sample  Intermediate  Reliability  Prediction  (Assumed  Worst  Case  Design  Stress) 


Pump,  Hydraulic  Drawing  Number  E45678  Date 


Figure  104.  Sample  Initial  Reliability  Prediction 


Figure  10-5.  Sample  Initial  Reliability  Prediction  (Mixed  Parts  Count) 
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Figure  10-6.  Sample  Intermediate  Reliability  Prediction  (Assumed,  Not  to  Exceed,  Electrical  and  Thermal  Stress) 


Figure  10-7.  Sample  Final  Reliability  Prediction  (Stress  Analysis) 


Item  A1AI  Assembly  Date 


Figure  10-8.  Sample  Initial  Maintainability  Prediction 


Figure  10-9.  Sample  Intermediate  Maintainability  Prediction 
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h.  A  discussion  of  any  changes  to  the  reli¬ 
ability  and  maintainability  block  diagrams 
and  mathematical  equations  including  any 
assumption  changes. 

i.  A  detailed  explanation  of  any  changes  in 
methodologies  and  their  impact  on  calcula¬ 
tions. 

10.2  CONTRACTOR  IN-HOUSE 
DOCUMENTS 

Documentation  of  the  type  listed  or  de¬ 
scribed  in  this  subsection  of  the  manual 
should  be  prepared  by  the  contractor  to  con¬ 
trol  and  support  the  overall  system  design 
and  evaluation  effort.  Additionally,  these 
documents  are  used  to  organize  and  prepare 
RM  A  reports  ( §  1 0. 1 )  submitted  to  SSPO  in 
accordance  with  the  CDRL.  This  documenta¬ 
tion  (indexed  to  the  relevant  paragraph  of 
NAVSEA  OD  21549(1)  includes,  as  a  mini¬ 
mum,  information  developed  for  or  contained 
in  the  following: 

a.  Program  Plan  Matrix  (3. 1.2.1) 

b.  Internal  Audit  Reports  (3. 1.4.2) 

c.  Integrated  Data  System  Plan  (3.1.7) 

d.  Corrective  Action  System,  Description 
of  (3.1. 8) 

•  Problem/Failure  Reporting,  Investiga¬ 
tion,  Analysis  and  Corrective  Action 

•  Trouble  Failure  Reports  (TFRs), 
contractor  use  of 

e.  Government-Industry  Data  Exchange 
Program  (GIDEP)  (3. 1.1 1) 

•  Utilization  of  GIDEP  Data,  especially 
GIDEP  ALERTS 

f.  Analysis  for  Design  (3.2. 1 ) 

•  Mission  Analysis:  Environmental  and 
Duty  Cycle  Profiles 

•  Software  Functions  Analysis 

g.  Design  Practices  and  Documentation 

(3.2.2) 

h.  Parts,  Devices,  and  Material  Selection 
Guide  (3.2.3. 1) 

i.  Project  Parts,  Devices,  and  Materials  List 

(3.2.3.2) 

j.  Design  Review  Plan  (3.2.4) 

•  Results  of  Design  Reviews 

k.  Parameter  Studies  (3.2.5) 

•  Parameters  Document  (3.2.5.b.) 

l.  Identification  and  Gasification  of  Char¬ 
acteristics  (3.2.6) 

•  Identification  of  Characteristics  That 
Could  Affect  the  Coordination,  Life, 


Interchangeability,  Function  or  Safety 
(CL1FS)  of  System  Elements 

m.  Controlled  and  Limited-Life  Items 
(3.2.7) 

•  Determination  and  Identification  of 

n.  Software  Verification  (3.2.8) 

o.  Integrated  Test  Program  (3.4.1) 

•  Integrated  Test  Program  Plan  (3.4.2) 

•  Integrated  Test  Program  Status  Re¬ 
ports  (3.4.3) 

•  Test  Reports  (3.4.6) 

p.  Qualification  Test  Program  (3.4.9) 

q.  Software  Tests  (3.4. 10) 

r.  Configuration  Management  Program 
(3.5) 

•  Configuration  Identification  (3.5.1) 

•  Configuration  Baselines  (3. 5. 1.1) 

•  Design  Disclosure  Documentation 
(3.5.5) 

•  Development  Documentation 

(3.5. 5.1) 

•  Production  Design  Disclosure 

(3.5. 5. 2) 

For  those  readers  who  desire  more  details  on 
the  documentation  cited  above,  these  details 
may  be  found  within  the  relevant  paragraphs 
of  NAVSEA  OD  21549(1]  identified  in 
parentheses  following  each  type  of  documen¬ 
tation. 

In  the  paragraphs  which  follow,  additional 
description  and  comment  are  provided  in 
selected  documentation  areas  that  relate 
closely  to  RMA  evaluation  program  activities. 
The  information  documented  in  these  areas 
have  many  uses,  such  as  in: 

•  Spares  provisioning  (estimating  spares 
usage  and  logistic  support  requirements). 

•  Evaluating  the  limits  for  the  infant  mor¬ 
tality  or  bum-in  period,  the  useful  life 
period,  and  the  wearout  period. 

•  The  development  of  objective  criteria  for 
the  removal,  replacement,  and  disposi¬ 
tion  of  Limited- Life  items. 

•  The  establishment  of  a  compendium  of 
R  and  M  data  based  on  actual  test  exper¬ 
ience. 

10.2.1  Test  History  File 

This  file  contains  an  entry  (see  figure  9-3) 
for  each  test  conducted.  It  provides  a  com¬ 
plete  record  of  all  test  data  in  a  form  that 
enables  various  reports  to  be  generated  quick¬ 
ly  and  efficiently  (e.g.,  see  figure  9-16,  Data 
Flow). 
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Data  in  the  test  history  file  may  be  sum¬ 
marized  (sorted)  over  any  or  all  fields  as  de¬ 
sired  .  For  example,  by  using  a  control  on  date 
of  test,  reports  can  be  generated  relative  to 
program  periods  of  interest  (e.g.,  the  last  two 
years).  By  sorting  on  level  of  test,  reports 
can  be  generated  which  contain  only  data  ob¬ 
tained  in  equipment  and  higher  level  tests.  A 
control  on  type  of  test  permits  reports  to  be 
generated  using  only  qualification  or  flight 
test  data.  The  flexibility  to  develop  any  de¬ 
sired  summary  should  be  readily  available. 
Figure  10-14  is  an  example  of  a  typical  test 
history  file  format. 

It  is  essential  that  the  data  in  the  test 
history  file  be  as  accurate  and  complete  as 
possible  since  it  forms  the  basis  of  all  other 
reports.  See  §  9.3  for  data  control  procedures 
and  techniques  to  improve  data  accuracy. 

The  following  provides  a  description  of  the 
columns  of  figure  10-14: 

Name 

A  twenty  character  field  for  the  name  of 
the  item. 

Level 

A  one  character  field  for  the  indenture 
level  of  the  item,  e.g.. 

Hardware  Software 

1  -  system  7  -  software 

2  -  subsystem  note.  Contractor  may 

3  -  equipment  break  software  down  to 

4  -  component  lower  indenture  levels 

5  -  module  when  advantageous.  The 

6  -  part  Evaluation  Plan  must 

fully  identify  the  cate¬ 
gories  used. 

Drawing  Number 

A  fifteen  character  field  for  the  drawing 
number,  e.g., 

four  digit  prefix 

letter  indicating  drawing  size 

four  digit  suffix 

letter  P  -  part  or  G  -  group 

three  digit  part  or  group  number 

two  characters  for  revision  identification 


Serial  Number 

A  six  character  field  for  the  contractor 
serial  number. 

Vendor  Name  Code 

A  six  character  field  for  identifying  the 
vendor.  The  Federal  Stock  Code  (FSC)  num¬ 
ber  is  often  used  for  this  purpose. 

Vendor  Serial  Number 

A  six  character  field  for  the  vendor  serial 
number. 

Level 

A  one  character  field  for  the  level  of  the 
test  being  performed  as  above.  1  -  system, 
2  -  subsystem,  etc. 

Type 

A  one  character  field  for  the  type  of  test 
(e.g.,  acceptance,  qualification,  demonstra¬ 
tion,  flight). 

Environment 

A  two  character  field  for  the  environment 
of  test  (e.g.,  ambient,  temperature,  vibration, 
salt  spray). 

Test  Site 

A  six  character  field  which  identifies  the 
location  at  which  the  test  was  performed 

Date  of  Test 

A  six  character  field  for  the  day.  month, 
and  year  of  test. 

Test  Reftort  Number 

A  seven  character  field  for  the  test  report 
number. 

State 

A  one  character  field  for  the  state  of  test 
(A  and  B  -  non-operating.  C  -  cycling,  and 
D  -  operating;  sec  §  4.1.2). 
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Time  or  Cycles 

A  six  character  field,  which  fur  states  A, 
B,  and  D,  the  first  four  characters  represent 
hours,  the  next  two  characters  represent  min¬ 
utes  of  test,  for  state  C,  the  six  characters 
represent  cycles  of  operation.  Consideration 
should  be  given  to  including  time  recording 
device  (ETI,  ETM)  readings,  when  available, 
in  an  added  column. 

Failures 

A  one  character  field  indicating  that  a 
failure  or  discrepancy  (such  as  the  failure  of 
one  element  of  a  redundant  configuration) 
occurred  in  the  test.  (Note  a  number  is  enter¬ 
ed  for  all  failures  or  discrepancies,  not  just 
reliability  failures;  applicable  failure  classifi¬ 
cations,  failure  report  numbers,  and  main¬ 
tenance  data  are  entered  when  a  failure  or 
discrepancy  occurs.) 

Failure  Classification 

A  two  character  field  permits  classification 
of  failures.  The  first  character  defines  the  rele¬ 
vance  of  the  failure  for  reliability  calculations. 

0  -  not  relevant 

1  -  relevant 

2  -  previously  relevant,  corrective  action 

reduced  classification  to  non-relevant 

Reliability  calculations  are  based  upon  the 
failures  classified  one  in  this  column  (except 
for  reliability  growth  models  which  should 
use  failures  classified  as  one  or  two).  If  cor¬ 
rective  action  is  not  effective  on  a  particular 
item  the  two  classification  must  be  changed 
to  one  for  affected  failures. 

The  second  character  is  the  FMECA  classi¬ 
fication  (1  -  minor,  2  -  major,  3  -  critical. 
4  -  catastrophic,  see  §  4. 2. 5.2). 

Failure  Report  Number 

A  field  (seven  characters)  for  the  failure 
report  number. 

Spares 

A  one  character  field  indicating  the  status 
of  spares  for  the  particular  failure  (  0  re¬ 
quired  and  available.  I  -  required  and  not 
available,  2  -  not  required). 


Corrective  Maintenance  Time 

A  six  character  field,  the  first  four  are  the 
hours,  and  the  last  two  the  minutes  required 
to  perform  the  corrective  maintenance.  (Cor¬ 
rective  maintenance  time  includes  fault  loca¬ 
tion,  isolation,  correction,  adjustment,  cali¬ 
bration,  and  repair  checkout  times.) 

Delay  Time 

A  six  character  field,  the  first  four  are  the 
hours  and  the  last  two  the  minutes  of  delay 
(delay  time  includes  administrative  and  sup¬ 
ply  delay  times). 

Project  Code 

A  four  character  field  which  indicates  the 
project  on  which  the  test  was  run  (used  only 
when  data  from  another  project  is  being  used 
to  supplement  data  on  the  current  project). 

Associated  Test  Report  Number 

A  seven  character  field  which  references  a 
higher  level  test  from  which  the  record  was 
generated. 

Associated  Failure  Report  Number 

A  seven  character  field  which  references  a 
higher  level  failure  report. 

Dale  of  Entry 

A  six  character  field  which  indicates  the 
day,  month,  and  year  that  the  data  is  entered 
into  the  data  system. 

10.2.2  Corrective  Action  System  Reports 

The  contractor  must  have  a  closed  loop 
corrective  action  system.  This  system  must 
consist  of  complete  problem  and  failure 
reporting,  investigation,  analysis  and  correc¬ 
tive  action.  The  corrective  action  process 
should  result  in  an  effective  resolution  of 
problems  and  failures.  Failure  Summary  Re¬ 
ports  for  hardware  and  software  failures  are 
valuable  documents  to  assure  a  positive 
corrective  action  program.  The  use  of  the 
term  failure  in  the  remainder  of  this  discus¬ 
sion  should  also  be  taken  to  include  prob¬ 
lems. 
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Each  hardware  failure  summary  report 
should  present  a  complete  record  of  all  open 
hardware  failures  and  those  that  were  closed 
since  the  last  report.  The  report  provides  a 
tool  for  assuring;  ( 1 )  that  failures  are  identi¬ 
fied  and  reported  to  cognizant  engineers,  to 
management,  and  to  SSPO;  (2)  that  sig¬ 
nificant  and  repetitive  failures  are  analyzed 
in  depth;  (3)  that  the  causes  and  modes  of 
failure  are  determined  correctly  and;  (4)  that 
effective  action  is  identified  and  taken. 

The  report  is  normally  issued  monthly  and 
contains  hardware  identification,  including 
nomenclature,  drawing  number,  serial  num¬ 
ber,  part  vendor,  and  program  code;  test  de¬ 
scription  including  test  type,  environment, 
site  or  reporting  activity,  and  date  of  test; 
test  results  including  failure  report  number, 
failure  classification,  and  description  of  fail¬ 
ure,  to  include  visual  observations;  indication 
of  whether  formal  failure  analysis  board 
action  is  required  along  with  results  of  the 
analysis  including  corrective  actions  recom¬ 
mended  and  taken;  and  responsible  personnel 
including  the  design  and  quality  control 
engineers.  A  separate  tabulation  of  repetitive 
failure  modes  should  be  included  and  dis¬ 
cussed.  A  sample  of  Failure  Summary  Report 
is  included  as  figure  1015.  Failures  can  be 
either  hardware  or  software  in  origin.  Certain 
failures  may  be  identified  immediately  as 
software  or  as  hardware,  however  some  fail¬ 
ures  may  not  initially  be  identified  as  such. 
The  tracking  of  this  “unknown”  type  of  fail¬ 
ure  presents  problems  which  the  system  must 
handle.  When  a  failure  is  entered  on  a  sum¬ 
mary  report,  analysis  should  have  identified 
the  type  of  failure. 

A  software  failure  summary  report  pre¬ 
senting  a  complete  record  of  all  software 
failures  that  have  occurred  provides  an  ana¬ 
logous  report  for  software  reliability. 

A  Failure  Analysis  Follow-Up  Report 
should  be  issued  as  required  (normally  month¬ 
ly)  for  internal  action  and  information.  It 
should  list  each  action  item  generated  by 
failure  analysis  and  show  the  status  of  action 
items,  cumulatively.  This  control  provides  for 
a  closed  loop  on  failures  and  corrective 
actions.  Figure  10-16  illustrates  this  report. 

10.2.3  Serial  Number  Summary  Report 

The  serial  number  summary  report  contains 
a  one-line  entry  for  each  selected  serialized 


piece  of  hardware  tested.  The  primary  pur¬ 
pose  of  the  serial  number  report  is  to  keep 
track  of  time  accumulated  on  critical  and 
time-  (or  cycle-)  sensitive  equipment.  The 
accumulated  test  time  is  compared  with  the 
allowable  maximum  test  time  (e.g.,  the  red¬ 
time  line  of  figure  2-2).  If  the  test  time  ex¬ 
ceeds  the  red-time  line,  a  determination 
should  be  made  if  the  item  should  remain  in 
operation  or  be  replaced  or  overhauled.  This 
determination  should  be  made  following  an 
analysis  of  the  reasons  for  the  overtesting  (see 
figure  9- 1 6). 

10.2.4  Environmental  Summary  Report 

The  environmental  summary  report  con¬ 
tains  a  one-line  summary  for  each  environ¬ 
ment  within  a  drawing  number.  The  number 
of  items  tested,  the  number  of  failures  and 
the  test  time  accumulated  in  states  A,  B  and 
D  and  the  test  cycles  accumulated  in  test  state 
C  are  provided.  The  entries  in  this  report  are 
not  normalized  to  equivalent  missions.  The 
information  in  this  report  is  useful  in  prepar¬ 
ing  the  failure  rate  compendium  (see  figure 
9-16). 

10.2.5  Mission  Simulation  Report 

The  mission  simulation  report  is  similar 
to  the  environmental  summary  report.  The 
primary  differences  are  that  only  mission  en¬ 
vironments  are  carried  and  the  results  are  pre¬ 
sented  in  equivalent  missions  rather  than  time 
and  cycles. 

The  mission  simulation  report  arranges  test 
data  in  a  form  convenient  for  calculating  reli¬ 
ability.  In  order  to  save  computation  time  at 
this  stage,  all  hardware,  software  and  environ¬ 
mental  data  which  is  not  required  for  the  reli¬ 
ability  status  report  is  eliminated  from  com¬ 
putation.  This  report  contains  one  line  of  in¬ 
formation  for  each  mission  environment  in 
which  the  hardware  and  software  are  tested. 
This  report  is  used  as  an  input  for  preparing 
the  reliability  status  report  (see  figure  9-16). 

10.2.6  Hardware  and  Software  Summary 
Report 

The  hardware  and  software  summary  re¬ 
port  contains  a  one-line  entry  for  each  draw¬ 
ing  number  tested.  It  also  contains  a  summa¬ 
tion  of  total  test  time  and  failures  accumu¬ 
lated  on  each  drawing  number.  This  report  is 


Figure  10-16.  Failure  Analysis  Follow-Up  Report 
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useful  in  logistic  (i.e.,  hardware  spares,  soft¬ 
ware  maintenance,  support  personnel,  etc.) 
planning.  The  report  can  be  expanded  to  in¬ 
clude  failure  rates  for  early  spares  provision¬ 
ing  if  these  rates  are  not  produced  as  part 
of  the  reliability  evaluation  plan  (see  figure 
9-16). 


10.2.7  Failure  Rate  Compendium  Report 

A  failure  rate  compendium  report  can  be 
a  valuable  by-product  of  a  reliability  data  sys¬ 
tem  Typically,  a  compendium  is  a  compila¬ 
tion  and  summary  of  the  hardware  test  re¬ 
sults  contained  in  the  test  history  files  of  the 
data  system.  Data  from  all  projects  are  sum¬ 
marized  by  various  hardware  groupings  to 
provide  a  reference  document  for  failure  rates 
and  failure  frequency  analysis.  The  failure 
rates  are  based  on  actual  test  experience  and 
are  valuable  in  making  predictions  for  new 
systems.  They  are  also  useful  in  making  de¬ 
sign  decisions  for  component  and  vendor 
selection.  The  failure  frequency  data  sum¬ 
maries  are  useful  for  indicating  the  relative 
seventy  to  environments,  and  the  contribu¬ 
tions  to  unreliability  from  design,  manufac¬ 
turing,  and  testing  activities. 

A  compendium  report  can  also  be  used  to 
summarize  failure  experience  due  to  design, 
manufacturing,  test,  handling,  or  unknown 
r;.jscs;  failure  rates  by  criticality  of  failure 
(i.e.,  no  failure,  catastrophic  failure,  or  out-of- 
specification  failure);  failure  rate  by  test  type 
(i.e.,  qualification,  acceptance,  field);  failure 
rates  by  level  of  test  (i.e.,  component,  equip¬ 
ment,  or  subsystem);  and  failure  rate  by  test 
environment  (i.e.,  vibration,  bench,  high 
temperatures,  etc.).  Summaries  may  be  gen¬ 
erated  for  generic  component  types  and  broad 
equipment  classes. 

A  separate  compendium  report  should  be 
developed  for  software  items  providing  similar 
analysis. 

10.2.8  Test  Effectiveness  Reports 

Since  test  cost  represent  a  significant  per¬ 
cent  of  project  resources,  the  contractor 
should  be  monitoring  the  effectiveness  of  the 
integrated  test  program.  NAVORD  OD 
42282  (2]  discusses  the  planning,  integrating, 
optimizing,  monitoring,  control  and  reporting 
necessary  for  this  purpose  in  detail. 


Analysis  of  test  effectiveness,  and  improv¬ 
ing  it  when  required,  is  an  essential  element 
of  the  overall  evaluation  process.  For  ex¬ 
ample,  the  effectiveness  of  procedures  em¬ 
ployed  by  the  contractor  to  eliminate  poten¬ 
tial  defects  from  a  subsystem  can  be  evalu¬ 
ated  using  summaries  of  the  experience  repre¬ 
sented  in  the  data  file. 

A  subsystem  can  be  depicted,  as  in  figure 
10-17,  in  its  flow  through  successive  analyses, 
reviews  and  tests  intended  to  detect  and 
divert  defects  from  passing  downstream  to  the 
operational  use  phase. 

Defects  that  are  present  in  the  subsystem 
and  eligible  for  detection  are  shown  entering 
the  test  block.  Within  the  block  some  defects 
are  generated  in  the  course  of  the  test.  Flow¬ 
ing  out  of  the  block  are  those  defects  that  are 
detected  and  diverted  and  those  that  escape. 
Defects  that  will  enter  the  next  screen  down¬ 
stream  are  the  sum  of  the  escapes  plus  any  de¬ 
fects  that  may  have  completely  by-passed  the 
block  for  reasons  of  ineligibility  (e  g.,  the 
test  is  not  designed  to  detect  the  failure  mode 
or  an  equipment  containing  defects  that  were 
detectable  was  not  installed  when  the  test 
was  run)  or  management  decision  (eg.,  a 
decision  to  by-pass  the  test  to  meet  schedule 
or  other  commitments). 

Effectiveness  of  a  test  block  can  be  char¬ 
acterized  by  a  variety  of  indices  such  as: 

g  -  i  Escapes 

Defects  Presented 


It  should  be  noted  that  the  ineligible  defects 
reduce  the  test  block  efficiency,  the  manage¬ 
ment  decisions,  however,  do  not  enter  the  test 
block. 

The  effectiveness  of  the  test  block  shown 
in  figure  10-17  is  then: 


The  effectiveness  of  the  screening  process 
(Ep)  is  less  since  the  management  decision 
permitted  ten  defects  to  by-pass  the  screen. 
It  would  be  measured  as 


E 


p 


.839. 


10.2.9  FMECA  Summary  Reports 

As  discussed  in  §  4.2.5,  a  FMECA  shall  be 
performed  to  identify  potential  failure  modes 
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TO  THE  NEXT  SCREEN 


DIVERTED 


Figure  10-17.  Test  Screen  Model 


and  assess  the  effects  of  these  modes  on  the 
system,  personnel  and  mission.  The  results  of 
the  engineering  analysis,  as  illustrated  in  the 
worksheets  in  figure  4-27,  shall  be  provided 
to  the  design  community  in  a  FMECA  Sum¬ 
mary  Report  or  series  of  reports.  The  sum¬ 
mary  report  shall  delineate  the  results  of  the 
analysis,  to  include  a  discussion  of  significant 
findings  and  a  detailed  description  of  design 
improvement  recommendations  for  preclud¬ 
ing  or  reducing  impact  of  potential  failures. 
Each  potential  failure  being  evaluated  shall 
include  its  criticality,  as  emphasis  shall  be 
directed  toward  eliminating  severe  (catas¬ 
trophic.  critical,  major)  failure  modes.  The 
summary  report(s)  shall  be  timely  and  require 
rapid  resolution  of  design  improvement 
recommendations  from  responsible  manage¬ 
ment  and  engineering  personnel.  Figure  10-18 
is  an  example  of  a  FMECA  Summary  Report. 
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TO: _  Report  Number: _ 

FROM. _  Date: _ 

Subject:  FMECA  Summary  Report  for  the  Track  Signal  Controller  6766300. 

A  failure  mode,  effects  and  criticality  analysis  (FMECA)  has  been  conducted  on  the  Track  Signal  Control¬ 
ler  (TSC)  6766300,  two  of  which  are  used  in  PAC  VAN  M46.  Highlights  of  the  analysis  are  provided  below. 

The  analysis  was  performed  by  Reliability,  Maintainability  and  Systems  Engineering  personnel. 

1 .  UNNECESSARY  USE  OF  A  28VDC  REGULATOR.  The  design  makes  unnecessary  use  of  +28VDC 
Regulator  for  energizing  relays  on  the  two  8131000  PWAs  RA3-108  and  RA3-1 10.  This  function-the 
only  one  it  performs  in  the  unit -can  be  performed  by  the  +26.5VDC  input  to  the  unit  from  the  PAC 
VAN  +26.5VDC  Power  Supplies.  Because  energization  of  the  relays  when  required  is  an  essential  internal 
function  of  the  TSC,  elimination  of  the  +28VDC  Regulator  would  remove  a  source  of  failure  from  the 
unit,  reduce  the  heat  generated  inside  the  unit,  and  cut  the  cost  of  the  unit  and  its  support.  It  is  recom¬ 
mended  that  the  +28VDC  Regulator  be  eliminated  from  the  unit  design. 

2.  DIODE  CR1 .  The  design  uses  a  forward  biased  diode  CR1  (on  PWA  8229000)  in  series  with  the  +26.5 
VDC  power  output  to  the  Receiver  Frequency  Selector  unit.  The  output  is  returned  so  as  to  light  the 
appropriate  three  lamps  of  the  front  panel  frequency  indicators.  In  the  first  place,  the  diode  serves  no 
purpose  whatsoever.  In  the  second  place,  the  diode  is  not  correctly  rated  for  this  application.  Each  of 
the  three  lamps  uses  a  nominal  steady  state  current  of  25  milliamperes.  or  a  total  of  75ma  for  the  three 
lamps  in  parallel.  CR1  has  a  rating  of  75  ma.  This  is  not  sufficient  to  handle  current  surges  which  can  be 
up  to  ten  times  steady  state  during  lamp  tum-on.  The  misapplication  of  CRl  can  be  seen  from  field  data 
against  the  TSC  6610800  used  in  M33  which  shows  that  all  four  failures  of  PWA  822900  in  that  unit  were 
the  result  of  CRl  failing  open.  This  problem  is  addressed  in  RFI  No.  4730-070.  Because  CRl  performs 
no  useful  function  and  is  a  proven  source  of  unreliability,  it  is  strongly  urged  that  it  be  deleted  from  the 
circuit. 

3 .  SIMPLER  CIRCUIT  NEEDED  FOR  OUTPUT  OF  RF  RCVD  SIGNAL.  The  relay  circuitry  used  to  send 
contact  closures  to  the  Events  Processor  and  Antenna  Position  Indicator  (signifying  the  event  that  a  suit¬ 
ably  strong  rf  signal  is  being  received)  can  be  made  more  reliable  through  simplification.  The  circuit  under 
discussion  here  appears  in  Zone  A2,  page  2  of  3  of  EL  6766301 .  The  recommended  change  would  improve 
unit  reliability  by  reducing  the  number  of  active  components  involved  in  the  design.  Note  that  the  simpli¬ 
fied  design  gets  rid  of  relay  K8  (as  well  as  K8's  driving  and  control  components)  and  does  away  with  the  5- 
VDC  pull-up  scheme. 

The  back-up  worksheets  for  this  report  are  on  file  in  Reliability  Engineering. 

Copies:  Systems,  Design,  Maintainability,  Reliability ,  Components,  Software  Diagnostics  and  Test  Engineer¬ 

ing,  Program  Manager 


Figure  10-18.  Example  of  a  FMECA  Summary  Report 
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Appendix  A 

ANALYSIS  OF  A  FIRE  CONTROL  SYSTEM 


This  appendix  presents  an  example  of  the 
analysis  of  a  hypothetical  Fire  Control  system 
which  is  represented  by  its  system  block  di¬ 
agram  in  figure  A-l.  In  the  diagram  each 
block  represents  an  equipment  or  group  of 
equipments.  The  directions  of  functional 
flows  are  labeled  and  inputs  and  outputs  are 
identified.  Thus  the  block  diagram  is  a  graph¬ 
ical  representation  of  the  dependence  of  sub¬ 
system  performance  on  the  operability  of 
its  hardware  elements.  The  analysis  is  sim¬ 
plified  for  the  sake  of  brevity,  however,  the 
procedures  used  in  each  step  are  illustrated. 

A.l  MISSION  ANALYSIS 

The  mission  consists  of  two  availability 
phases, .designated  fa)  .and.  Cb),  and. a  launch 
phase.  System  functions  in  phase  (a)  are  lim¬ 
ited  to  monitoring  and  regulation  of  tempera¬ 
ture  and  electrical  power  in  each  of  16 
missile  guidance  systems.  Maximum  duration 
of  phase  (a)  is  4,000  hours.  Phase  (b)  func¬ 
tions  include  those  of  phase  (a)  plus  functions 
necessary  to  control  the  assignment  and  erec¬ 
tion  of  the  missiles.  The  system  is  defined  as 
fully  up  if  it  can  perform  the  phase  (b)  func¬ 
tions  and  initiate  the  launch  phase  for  all  16 
of  the  missiles.  Maximum  duration  of  phase 
(b)  is  1,500  hours.  Functions  during  the 
launch  phase  are  those  required  to  control 
the  preparation  and  firing  of  the  missiles.  The 
launch  phase  has  a  maximum  length  of  3 
hours,  during  which  no  failures  are  permitted. 
Minimum  acceptable  availability  is  0.85  in 
phase  (a)  and  0.99  in  phase  (b).  A  reliability 
requirement  of  0.95  applies  to  the  launch 
phase.  The  system  can  function  in  several 
modes;  however,  the  illustration  will  be  lim¬ 
ited  to  the  tactical  mode  and  to  availability 
phase  (b).  Figure  A-2  shows  the  development 
of  the  mission  profile. 


A.2  SYSTEM  ANALYSIS 

Returning  to  the  system  block  diagram  of 
figure  A-l,  one  can  construct  an  idealized 
reduced  block  diagram  from  it  which  reveals 
the  possible  “up”  or  “down”  states  of  the 
system.  This  reduced  block  diagram  is  shown 
in  figure  A-3.  At  first  it  would  seem  that  as 
many  as  2* 9  states  can  be  identified  from  the 
possible  “up”  or  “down”  combinations  of 
each  one  of  the  19  equipment  blocks.  If  one 
assumes,  however,  that  all  equipments  oper¬ 
ate,  fail,  and  are  repaired  independently,  and 
that  the  minimum  system  configuration  for 
functional  capability  are  three  independent 
stages  in  series,  S  consisting  of  a  single  block, 

C  consisting  of  two  blocks  in  parallel  and  M 
consisting  .of  .  .16  .blocks .  in  -series,  then,  the  - 
problem  becomes  much  simpler. 

Corrective  maintenance  functions  are  anal¬ 
yzed  in  figure  A-4. 

The  equation  for  system  availability  is  de¬ 
rived  in  accordance  with  the  principles  pre¬ 
viously  set  forth  in  §  4. 2. 2. 2  and  Appendix 
D  §  D.l  and  D.3.  The  system  consists  of 
three  independent  stages  in  series  and  is  avail¬ 
able  when  all  three  stages  are  up.  Thus, 

3 


where  a.  is  the  availability  of  jth  stage.  The 
availability  of  a  stage  consisting  of  n  equip¬ 
ment  blocks  in  parallel,  any  m  of  which  are 
required  to  be  up  for  the  stage  to  be  up,  is 
given  by 

*.  -  X  (;) 

=  "£*  /n 

x=°  \  x/  (X+#l)„ 


A-l.  Fire  Control  System  Block  Diagram 


NAVSEA  OD  29304B 


where 


(")=n!/[(n-x)!x!] 
and  x  is  the  number  of  failures. 
Then  stage  wise, 

A:  A:  *  Ai  *  A, 

1  'S  'C  'M 


Block  wise, 


A  J-*-\ 

'  \h  +  *s) 
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yM  +  »m)  _ 

where 


Data  required  to  support  measurements  are 
implicit  in  the  parameters  of  the  equation. 
They  are  the  uptimes  and  downtimes  of  each 
equipment  block  from  which  statistical  esti¬ 
mates  of  the  failure  rates  and  repair  rates  can 
be  obtained. 

Since  the  prediction  does  not  indicate  that 
the  system  will  initially  meet  its  specified 
availability  (.99  in  phase  b),  it  is  necessary  to 
apportion  requirements  to  elements  of  the 
system.  Before  this  is  done,  the  system  should 
also  be  analyzed  with  respect  to  its  reliability 
requirements  since  that  requirement  deter¬ 
mines  the  upper  limit  of  allowable  failure 
rate.  Figure  A-5  summarizes  the  system  pre¬ 
dicted  parameters. 

The  reliability  requirement  for  the  system 
is  R(3  hours)  >  .95.  Then,  in  terms  of  blocks 
(not  stage),  the  system  model  is 

-M  ',-(1-Rc)Jl  [Rm“] 

=  RSRC  Rm  16  (2-Rc) 

=  2e'3(*s  +  *  ,6*M>_e-3(*s  *  jxc  ♦  “*m> 

=  .95 

If  the  prediction  accurately  reflects  the  rel¬ 
ative  complexities,  stress  levels,  state-of-the- 
art  factors,  etc.  that  characterize  each  equip¬ 
ment  block,  it  is  reasonable  to  apportion  fail¬ 
ure  rates  among  the  blocks  in  the  same  ratios 
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SYSTEM 
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‘The  failure  rate  of  an  n-block  parallel  stage  is  not 
constant  but  is  a  function  of  time.  For  small  X  it  is 
approximated  by  (Xt)n/t. 


Figure  A-5.  Summary  of  Predicted  Parameters 

indicated  by  the  prediction.  Then  the  failure 
rates  of  the  C  and  M  blocks  can  be  expressed 
in  terms  of  the  S  failure  rate. 


_  .0086 
Xc  =  .0035  s 


2.46XS 


.0083 
.0035  s 


2.37XS 


Given  these  relative  magnitudes  of  block  fail¬ 
ure  rates,  the  system  model  becomes 

R(3)  -  2e-4>-3s>.s(3)_e-43.84\s(j)_  95 


which  is  easily  solved  graphically  to  yield 
Xs  =  .00044.  Then,  by  the  ratios  previously 
stated,  Xc  =  .00108  and  XM  =  .00104.  These 
are  the  highest  permissible  block  failure  rates 
consistent  with  the  system  reliability  require¬ 
ment.  They  correspond  to  MTBF’s  of  2280 
hours,  930  hours  and  960  hours  for  the  S.  C 
and  M  blocks  respectively,  and  they  represent 
bounds  on  the  tradeoff  regions  available  for 
meeting  the  system  availability  requirement. 
The  system  MTBF  is  found  by  integrating 
the  reliability  function.  It  should  be  noted 
that  the  reliability  of  electronic,  mechanical 
and  electro-mechanical  devices  can  usually  be 
characterized  in  terms  of  constant  failure 
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rates  (implying  exponentially  distributed  fail¬ 
ure  times),  provided  the  devices  are  complex 
and  consist  of  parts  of  varying  ages  or  mix¬ 
tures  of  part  types  having  different  mean 
lives  and  failure  distributions  [  1 J .  Therefore: 


MTBF  =  0  =  SI  2e-  0UI»‘  dt  -  J0"  e*  °'«“dt 


2  1 
.01818  '  .01926 


58  hours. 


Apportionment  of  availability  can  be  begun 
by  considering  the  system  as  a  whole,  then 
the  minimum  acceptable  A,  is: 

=  .992 

An  optimum  combination  of  system  param¬ 
eters  9  (MTBF)  and  Mc  (MTTR)  is  sought 
subject  to  the  following  constraint: 


1 


<-A, 

A, 


.008 

=  — —  =  .00806 


Several  approaches  to  apportionment  of 
goals  for  the  improvement  of  reliability  and 
maintainability  are  available  and  are  discussed 
in  the  literature.  One  of  the  simplest  is  to  de¬ 
termine  the  magnitude  of  improvement  need¬ 
ed  in  each  characteristic  alone  in  order  to 
satisfy  the  system  As  requirement.  The  im¬ 
provements  required  are: 

(a)  MTTR  required  at  minimum  MTBF: 


=  .464  hr.  *  28  min. 


rate  but  2Xc/3  =  .0057,  the  reciprocal  of  the 
stage  MTBF  [1J . 

Thus,  it  is  apparent  that  the  availability  re¬ 
quirement  can  be  met  by  improving  the  sys¬ 
tem  MTTR  to  28  minutes  while  meeting  the 
minimum  MTBF  consistent  with  the  reliabil¬ 
ity  requirement,  or  the  availability  require¬ 
ment  can  be  met  with  the  predicted  MTTR  of 
one  hour  if  the  system  MTBF  can  be  raised 
to  124  hours.  Between  these  extremes  there 
are  an  unlimited  number  of  combinations  of 
MTBF  and  MTTR  that  will  also  satisfy  the 
requirement. 

In  order  to  apportion  reliability  and  main¬ 
tainability  goals  in  an  optimum  manner,  it  is 
desirable  to  predict  the  relative  difficulty  of 
improving  each.  In  this  example,  the  predic¬ 
tions  take  the  form  of  cost  functions, 
although  the  actual  resources  involved  may 
include  engineering  and  manufacturing  man¬ 
hours  as  the  major  or  sole  variables.  It  is  not 
realistic  to  formulate  reliability  and  maintain¬ 
ability  as  deterministic  functions  of  the  re¬ 
sources  expended  for  their  realization.  At 
best,  the  analyst  can  invoke  past  experience 
to  predict  the  functional  relationships  in  a 
largely  subjective  manner.  Feasible  improve¬ 
ment  actions  may  be  listed,  and  engineers 
asked  to  make  optimistic,  expected  and 
pessimistic  predictions  of  the  costs  entailed 
in  each  and  the  degree  of  change  each  would 
produce  in  the  reliability  and  maintainability 
of  the  system.  The  actions  can  then  be  listed 
as  scaled  sets  and  the  distributions  of  the  cost 
versus  improvement  relationships  estimated 
12]. 

For  purposes  of  illustration,  let  it  be  as¬ 
sumed  that  such  an  analysis  is  performed  for 
the  Fire  Control  System  and  that  the  follow¬ 
ing  expected  cost  functions  are  obtained  over 
limited  ranges  of  6  and  Mc .  The  unit  of  cost 
C  is  dollars  X  1 04 . 


C  (Mc)  =  Mc'2 


(b)  MTBF  required  at  predicted  MTTR: 


C  (0)  =  B1 13600 


B 


require*! 


1 24  hrs. 


The  method  of  Lagrange  multipliers  is  em¬ 
ployed  to  minimize  the  total  cost  function, 
such  that: 


Me  in  the  above  equation  is  computed  from 
equation  (A-l).  The  frequency  weighting 
factor  for  the  C-stage  is  not  really  a  failure 


G  *  C(Mc)  +  C(fl)  +  o 
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where  a  is  a  Lagrange  multiplier.  The  partial 
derivatives  of  the  cost  function  are  set  equal 
to  zero: 


each  C  block  and  1432  hours  for  each  M 
block.  For  purposes  of  illustration,  the  ap¬ 
portionments  are: 


3G  *C'(Mc)  +  “  =  0 


3W 


e 


bG 

be 


oMr 

*  C'  (6) - -  0 


3G  Mc  0-At) 
da  6  A j 


and  the  resulting  system  is  solved  simulta¬ 
neously  for  the  optimum  values  of  da  and 
Mc  ,  yielding: 


60  =  86.6  hrs. 

ML.  *  .693  hr.  =  42  minutes 
*-0 

These  optimum  system  values  can  then  be 
apportioned  in  a  convenient  manner  back  to 
block  level  and  ultimately  to  equipment  level. 
Since  all  of  the  blocks  have  roughly  the  same 
predicted  maintainability,  the  apportioned 
MTTR  would  be  close  to  42  minutes  for  each 
type.  Apportioned  MTBF’s  would  be  about 
3393  hours  for  the  S  block,  1379  hours  for 


«.  -  86  6 -  ?  «•<»-/  I'd!-  *  -  -  — i— 

*  \  '  4I.3W,  4384X, 

X,  ■  0002947.1,-3393  hr 

V  ■  2.46X,  ■  .0007230.  #c  »  1379  hn 

XH  -  2.37X,  •  .0006914.  *H  •  1432  hn. 

«c.  ■  -  (*.*•«,  “t  *  ,6X“  “*«)/ (k**T4 ,6X“) 

0002947M,t  4-j (.0007230)  (1.1 3)M, #  ♦  16  (.0006984)  (0.96)M,s 
.0002947  ♦■^(.0007250)  ♦  16  (.0006984) 

Me&  *  6707  hr.  »  40  minutci 

M‘C  »  “  1  13  *  7$79  hr  *  mii.ute* 

* 0  96®«, ■  6439  hI  * 39 
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Appendix  B 

DEMONSTRATED  FLIGHT  RELIABILITY  (DFR) 
AND  RAW  SCORE  METHODS 


This  appendix  presents  two  of  the  meth¬ 
ods  currently  used  by  the  Lockheed  Missile 
and  Space  Company  to  assess  the  flight  phase 
reliability  for  POSEIDON  and  TRIDENT  mis¬ 
sile  subsystems. 

B.l  RAW  SCORE  METHOD 

The  material  in  this  paragraph  is  taken 
from  11]. 

B.l  .1  Raw  Score 

Raw  score  is  a  measure  of  flight  success 
based  on  the  ratio  of  re-entry  bodies  (RBs) 
successful  over  the  number  launched.  Certain 
events  which  interfere  with  the  completion 
of  the  flight  may  cause  the  exercise  to  be 
declared  a  No-Test  (does  not  count  as  at¬ 
tempt  or  outcome). 

Raw  Score  for  the  missile  may  be  pre¬ 
sented  based  upon  many  criteria.  In  the 
MRR,  Raw  Score  is  presented  both  for  the 
missile  system  and  for  the  missile  body. 
The  missile  system  Raw  Score  includes  guid¬ 
ance  and  RB  component  flight  performance, 
but  excludes  nontactical  hardware  and  other 
FBM  subsystem  problems;  the  missile  body 
Raw  Score  additionally  excludes  guidance 
and  RB  problems.  The  general  rule  for  han¬ 
dling  excluded  problems  is  that  if  the  oc¬ 
currence  precluded  the  completion  of  the 
flight  it  is  a  No-Test,  whereas  if  it  merely 
perturbated  the  result  the  performance  is 
considered  successful. 

B.l. 2  Groundrules  for  Calculating  Raw  Score 

a.  Calculations  are  based  upon  the  RBs 
launched  and  represent  a  success  ratio. 

b.  Inadvertent  actions  (such  as  command 
destruct)  not  precipitated  by  missile 


malfunction  will  be  counted  as  No-Test 
(i.e.,  neither  the  attempt  nor  the  result 
will  be  counted). 

c.  Failures  of  instrumentation  and  destruct 
components  will  be  excluded  from 
consideration. 

d.  The  failure/anomaly  caused  by  a  non¬ 
missile  subsystem  will  be  counted  on 
the  following  basis: 

(1)  No-Test  if  the  occurrence  precludes 
the  completion  of  the  missile  exer¬ 
cise,  and 

(2)  Success  to  the  extent  that  the  mis¬ 
sis  performed  properly  to  the 
input/stimuli. 

e.  For  missile  body  raw  score,  additional 
exclusion  of  guidance  and  RB  failures 
are  taken  on  the  same  basis  as  described 
in  d.  ( 1 )  and  (2)  above. 

B.2  DEMONSTRATED  FLIGHT 
RELIABILITY 

The  material  in  this  paragraph  is  taken 
from  12]  which  was  not  changed  except  for 
paragraph  and  figure  numbering. 

B.2.1  Background  of  Demonstrated  Flight 
Reliability 

Reliability  assessment  models  which  at¬ 
tempt  to  integrate  data  from  a  number  of 
sources  (e  g.,  flight,  system  testing,  package 
testing)  usually  are  cumbersome  to  use  and, 
more  importantly,  are  dependent  upon  a 
number  of  assumptions.  For  use  with  flight 
data  alone,  a  model  is  needed  which  is  simple 
to  use,  is  easy  to  understand,  and  uses  a  mini¬ 
mum  of  assumptions.  The  Demonstrated 
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Flight  Reliability  (DFR)  was  developed  by 
MARC  (2)  to  provide  a  method  more  realis¬ 
tic  than  Raw  Score  but  still  simple  to  calcu¬ 
late  and  relatively  assumption-free. 

Demonstrated  Flight  Reliability  is  de¬ 
fined  to  be  the  estimate,  using  flight  data 
only,  of  the  expected  percentage  of  successful 
re-entry  bodies.  DFR  is  based  on  mission 
phases  rather  than  missile  segments;  this 
reduces  the  complexity  of  the  calculation 
and  simplifies  the  process  of  attributing 
failures  to  the  proper  sources. 

The  derivation  of  DFR  is  simple  enough  to 
facilitate  modification  in  several  ways.  For 
example,  suppose  the  deployed  population 
consists  of  two  (or  more)  sub-populations 
of  missiles  with  presumably  different  relia¬ 
bilities  (e.g.,  POMP  and  pre-POMP  missiles). 
Unless  the  numbers  of  flights  from  each  sub¬ 
population  are  roughly  in  the  same  propor¬ 
tion  as  the  fleet  mix.  a  DFR  computed  on  the 
basis  of  all  flight  tests  combined  might  not 
accurately  represent  the  fleet  reliability. 
However,  DFR’s  can  be  computed  for  each 
of  the  sub-populations  on  tne  basis  of  its 
flight  data  alone,  and  these  DFR’s  can  be 
combined  as  a  weighted  average  to  give  a 
DFR  representative  of  the  fleet  mix.  Another 
modification  is  to  compute  one  DFR  for  the 
missile  system  and  another  for  the  missile 
body  alone;  in  the  latter  case  failures  of  the 
guidance  system,  destruct  system,  and  re¬ 
entry  bodies  are  not  counted  against  the 
missile  body. 


B.2.2  Definitions 

Mission 

The  mission  profile  consists  of  a  first  boost 
phase,  a  second  boost  phase,  and  a  deploy¬ 
ment  phase.  The  deployment  phase  in  turn 
consists  of  the  continuous  functioning 
of  a  portion  of  the  equipment  section 
concurrent  with  the  release,  at  given  inter¬ 
vals,  of  the  re-entry  bodies.  These  con¬ 
stitute  the  horizontal  deployment  phase 
and  several  individual  vertical  deployment 
phases.  See  figure  B-l  for  an  illustration  of 
this. 

N 

The  number  of  re-entry  bodies  in  a  tactical 
configuration. 

Reliability 

It  should  be  noted  that  the  reliabilities 
defined  below  are  all  conditioned  by  the 
successful  completion  of  the  relevant  pre¬ 
ceding  portions  of  the  mission.  For  ex¬ 
ample,  the  reliability  of  the  horizontal 
deployment  phase  between  the  planned 
times  of  the  first  and  second  re-entry 
body  releases  (R,2  in  figure  B-l)  assumes 
the  successful  completion  of  the  first  and 
second  boost  phases,  and  the  horizontal 
deployment  phase  through  the  planned 
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time  of  the  first  re-entry  body  release. 
However,  it  is  independent  of  the  success 
or  failure  of  the  first  re-entry  body’s 
vertical  deployment  phases. 

It  should  also  be  noted  that  the  terminol¬ 
ogy  in  the  definitions  is  common  to  both 
missile  system  reliability  and  missile  body 
reliability.  When  discussing  missile  body 
reliability,  the  reliabilities  of  the  guidance 
system,  destruct  system,  and  re-entry  body 
are  excluded  where  appropriate. 

RB1  Reliability  of  the  first  boost  phase. 

This  is  the  probability  that  a  missile  which 
enters  the  first  boost  phase  will  not  fail 
during  it. 

Rbj  Reliability  of  the  second  boost  phase, 
given  successful  completion  of  the  first 
boost  phases 

This  is  the  probability  that  a  missile  which 
enters  the  second  boost  phase  will  not 
fail  during  it. 

R,,  Reliability  of  the  horizontal  deployment 
phase  from  initiation  through  the  planned 
time  of  deployment  of  the  first  re-entry 
body,  given  the  successful  completion  of 
the  first  and  second  boost  phases. 

This  is  the  probability  that  a  missile  which 
enters  the  horizontal  deployment  phase 
will  not  fail  before  the  planned  time  of  the 
first  re-entry  body  release  (considering 
only  that  portion  of  the  missile  commoi 
to  the  functioning  of  all  re-entry  bodies). 

Ri,  for  j  =  2.  3,  ....  N  this  is  the  reliability 
of  the  horizontal  deployment  phase  be¬ 
tween  the  planned  times  of  deployment  of 
the  (j'IVst  and  j-th  re-entry  bodies,  given 
the  successful  completion  of  the  two  boost 
phases  and  all  preceding  portions  of  the 
horizontal  deployment  phase. 

These  are  the  probabilities  that  a  missile 
which  enters  the  horizontal  deployment 
phase  and  successfully  passes  the  planned 
time  of  a  given  re-entry  body  release  will 
not  fail  before  the  planned  time  of  the 
next  re-entry  body  release  (considering 


only  that  portion  of  the  missile  common 
to  the  functioning  of  all  re-entry  bodies). 

^I(N  +  I) 

For  notationa)  convenience,  this  is  defined 
to  be  zero. 

Conceptually,  this  can  be  interpreted  as 
meaning  that  there  is  probability  0  of  a 
tactical  missile  deploying  N+l  or  more 
re-entry  bodies. 

Rv  Reliability  of  a  single  vertical  deployment 
phase. 

This  is  the  probability  of  successful  opera¬ 
tion  of  that  portion  of  the  missile  which 
is  unique  to  the  functioning  of  a  single 
re-entry  body.  Errors  in  accuracy  (i.e„ 
re-entry  bodies  off  target)  are  attributed 
to  the  phase  which  was  responsible,  not  to 
the  vertical  deployment  pnase  (unless 
appropriate).  Rv  is  assumed  to  be  the 
same  for  all  re-entry  bodies  on  a  tactical 
missile. 

B.2.3  Expected  Percentage  of  Successful 
Re-Entry  Bodies 

The  expected  percentage  of  successful 
re-entry  bodies  out  of  N  re-entry  bodies  on  a 
tactical  missile  is  defined  as 

_  ]  n 

R  E  k-Pr  {exactly  k  successful  (B-l) 
k  1  re-entry  bodies | 

For  exactly  k  re-entry  bodies  to  be  success¬ 
ful,  the  first  and  second  boost  phases  must 
be  successful,  the  horizontal  deployment 
phase  must  be  successful  through  the  planned 
deployment  time  of  i  >  k  re-entry  bodies, 
and  of  these  i  re-entry  bodies  exactly  k  must 
be  successes  and  i-k  must  be  failures. 

The  probability  of  the  missile  operating 
through  the  planned  deployment  time  of  the 
i-th  re-entry  body  and  failing  prior  to  the 
planned  deployment  time  of  the  (i+1  Fst  is 

^bi  ^ .n  R,,^  j^l~R|(,< i (B-2) 

for  i=l,  ....  N. 
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Also,  the  probability  that  exactly  k  of  i  re¬ 
entry  bodies  successfully  complete  their 
vertical  deployment  phases  is 

(k)  Rvk(l-Rv)‘‘k  for  k=0 . i.  (B-3) 

Combining  (B-2)  and  (B-3),  the  probability 
that  exactly  k  re-entry  bodies  are  successful 
is 


Rb,  r 


B2 


(j,  R») 


j  »-R|(Hl)  [  (jj)  Rvk(l-Rv)‘_l 


(B-4) 


where  the  summation  is  needed  to  take  into 
account  all  possible  missile  operating  success 
times  long  enough  to  deploy  at  least  k  re¬ 
entry  bodies.  Substituting  (B-4)  into  (B-l) 
yields  the  expected  percentage  of  re-entry 
bodies 


1  N  N  /  1  \ 
R  =  N  Rb.  RB2  k  2  ^.n  Ryj 


(B-5) 


[j-R.<f.f]  (j,)  Rvko-R vrk 

Simplifying,  we  obtain,  when  N  >  1 , 

,  n - 1  N-l  /  i  \ 

*«.,**  ,1k  I  (  n  R„) 

Qi-R.(h.Q  (ic)  Rvkd-R vy-k 

+  NRB.RB2  Ry^ 

j,  k  (?)  Vd-Rv)N-k 

,  N-l  N-l  /  i  \ 

R  “  N  RbiRbi  kf,  k  tfk 

[J“Rio*iQ  (k)  Rvk(i-Rv  V"k 


+  RB1RB2Rv  R« 


where  in  the  first  step  we  used  the  fact  that 
Ri(N+i)  *  0  by  definition,  and  in  the  second 
step  we  recognized  that  the  summation  in 
the  second  term  was  just  the  definition  of 
the  expected  value  of  a  binomial  random 
variable  with  parameters  N  and  Rv . 

We  will  now  show  that  (B-5)  and  (B-6) 
are  equivalent  to 


1  n  i 

R  =  N  R»>  R«»  Rv  S  "  (B-7) 

First,  when  N  =  1,  using  (B-5)  we  obtain 
rb.RbIR.,0-R,2)rv  and  since  R«  = 

Ri(N +. )  *  0  by  definition,  this  in  tum  be¬ 
comes  Rbirb2RvR|i  which  is  clearly 
(B-7)  with  N  =  1 . 

Next,  we  will  assume  that  for  N  =  n  (B-6) 
and  (B-7)  are  equivalent,  and  use  this  to 
show  that  they  must  be  equivalent  for 
N  =  n  +  1 .  When  N  =  n  +  1  (B-6)  becomes 


R  -  sii  K,  R„  j,  k  i  (A  R„) 

Q-*K»iQ  (L) 

n  +  I 

+  RB1RB2RV  n  R|j 

t  n-l  n-l  /  i  \ 

=  ^  Rb.  RB2  J,  k  £  Ryj 

[j-R.c*o]  (0  Rvk(i-Rvrk 

+  n+T  Rb>  R»j  ( Rii) 

D'R'^'G  j.  k  (0  Rvk(i-Rv)‘‘k 

n  ♦  1 

+  RB|  RB2  RV  n  Ry 
i*l  w 


B-4 


NAVSEAOD  29304B 


time  of  the  first  reentry  body  release. 
However,  it  is  independent  of  the  success 
or  failure  of  the  first  re-entry  body's 
vertical  deployment  phases. 

It  should  also  be  noted  that  the  terminol¬ 
ogy  in  the  definitions  is  common  to  both 
missile  system  reliability  and  missile  body 
reliability.  When  discussing  missile  body 
reliability,  the  reliabilities  of  the  guidance 
system,  destruct  system,  and  re-entry  body 
are  excluded  where  appropriate. 

RBl  Reliability  of  the  first  boost  phase. 

This  is  the  probability  that  a  missile  which 
enters  the  first  boost  phase  will  not  fail 
during  it. 

RB2  Reliability  of  the  second  boost  phase, 
given  successful  completion  of  the  first 
boost  phase-. 

This  is  the  probability  that  a  missile  which 
enters  the  second  boost  phase  will  not 
fail  during  it. 

Rfl  Reliability  of  the  horizontal  deployment 
phase  from  initiation  through  the  planned 
time  of  deployment  of  the  first  re-entry 
body,  given  the  successful  completion  of 
the  first  and  secono  boost  phases. 

This  is  the  probability  that  a  missile  which 
enters  the  horizontal  deployment  phase 
will  not  fail  before  the  planned  time  of  the 
first  re-entry  body  release  (considering 
only  that  portion  of  the  missile  common 
to  the  functioning  of  all  re-entry  bodies). 

R,j  for  j  =  2,  3,  .  .  .  ,  N  this  is  the  reliability 
of  the  horizontal  deployment  phase  be¬ 
tween  the  planned  times  of  deployment  of 
the  (j-l)-st  and  j-th  re-entry  bodies,  given 
the  successful  completion  of  the  two  boost 
phases  and  all  preceding  portions  of  the 
horizontal  deployment  phase. 

These  are  the  probabilities  that  a  missile 
which  enters  the  horizontal  deployment 
phase  and  successfully  passes  the  planned 
time  of  a  given  re-entry  body  release  will 
not  fail  before  the  planned  time  of  the 
next  re-entry  body  release  (considering 


only  that  portion  of  the  missile  common 
to  the  functioning  of  all  re-entry  bodies). 

R|(N*I) 

For  notations)  convenience,  this  is  defined 
to  be  zero. 

Conceptually,  this  can  be  interpreted  as 
meaning  that  there  is  probability  0  of  a 
tactical  missile  deploying  N+l  or  more 
re-entry  bodies. 

Rv  Reliability  of  a  single  vertical  deployment 
phase. 

This  is  the  probability  of  successful  opera¬ 
tion  of  that  portion  of  the  missile  which 
is  unique  to  the  functioning  of  a  single 
re-entry  body.  Errors  in  accuracy  (i.e., 
re-entry  bodies  off  target)  are  attributed 
to  the  phase  which  was  responsible,  not  to 
the  vertical  deployment  phase  (unless 
appropriate).  Rv  is  assumed  to  be  the 
same  for  all  re-entry  bodies  on  a  tactical 
missile. 

B.2.3  Expected  Percentage  of  Successful 
Re-Entry  Bodies 

The  expected  percentage  of  successful 
re-entry  bodies  out  of  N  re-entry  bodies  on  a 
tactical  missile  is  defined  as 

1  N 

R  =-jq-  £  k*Pr  {exactly  k  successful  (B-l) 
k=l  re-entry  bodies | 

For  exactly  k  re-entry  bodies  to  be  success¬ 
ful,  the  first  and  second  boost  phases  must 
be  successful,  the  horizontal  deployment 
phase  must  be  successful  through  the  planned 
deployment  time  of  i  >  k  re-entry  bodies, 
and  of  these  i  re-entry  bodies  exactly  k  must 
be  successes  and  i-k  must  be  failures. 

The  probability  of  the  missile  operating 
through  the  planned  deployment  time  of  the 
i-th  re-entry  body  and  failing  prior  to  the 
planned  deployment  time  of  the  (i+l)-st  is 

Rb.Rb,  (n  R.)  [j-Rxmo]  (B-2) 
for  i*  I . N. 


B-3 


NAVSEAOD  29304B 


Also,  the  probability  that  exactly  k  of  i  re¬ 
entry  bodies  successfully  complete  their 
vertical  deployment  phases  is 

(k)  RvkO“Rv)*'k  for k=0,..., i.  (B-3) 

Combining  (B-2)  and  (B-3),  the  probability 
that  exactly  k  re-entry  bodies  are  successful 
is 

Rb,  Rbj  ,?k  ^  Rjj^ 

(B-4) 

L‘-r.(h.)J  (0  Rvkd-Rv  y~k 

where  the  summation  is  needed  to  take  into 
account  all  possible  missile  operating  success 
times  long  enough  to  deploy  at  least  k  re¬ 
entry  bodies.  Substituting  (B-4)  into  (B-l) 
yields  the  expected  percentage  of  re-entry 
bodies 

1  N  "  /  ‘  \ 

(B-5) 

[>%♦,>]  (L)  Rvkd-Rvy-k 
Simplifying,  we  obtain,  when  N  >  1, 

,  n - ,  N-l  /  i  \ 

R  *  nRbiRb2  k,Fk  (j?,  R«j 

J-Rio+d]]  ([)  RvkO-Rvy_k 
+  nRbiRb2  (jUi  Rw) 

j,  k  (k) 


R 


RB2 


N-l  N-l 


I  k  2 

k  =  l  «*k 


(j-R.<«*,)]  (ic)  Rvk(i-Rv)*-k 

N 

+  RB|RBJRV 


where  in  the  first  step  we  used  the  fact  that 
Ri(N+i)  *  0  by  definition,  and  in  the  second 
step  we  recognized  that  the  summation  in 
the  second  term  was  just  the  definition  of 
the  expected  value  of  a  binomial  random 
variable  with  parameters  N  and  Rv . 

We  will  now  show  that  (B-S)  and  (B-6) 
are  equivalent  to 


R  =  n  RBi  R 


BI*'V  j?,^' 


(B-7) 


First,  when  N  =  1,  using  (B-S)  we  obtain 
RB,RB2R»,(«-R.2)Rv  and  since  Rn  = 

Rl(N+1)  =  0  by  definition,  this  in  turn  be¬ 
comes  RB1  Rbj  Rv  Rn  which  is  clearly 
(B-7)  with  N  =  1. 

Next,  we  will  assume  that  for  N  =  n  (B-6) 
and  (B-7)  are  equivalent,  and  use  this  to 
show  that  they  must  be  equivalent  for 
N  =  n  +  1 .  When  N  —  n  +  1  (B-6)  becomes 


R  *  nil  R»|R»>  J,k  ,f„  (n,  Rk) 

^■Rm+o^j  (If)  Rvk(i-Rv )l_k 

n  +  I 

+  Rb  ,  RB2  Rv  n  R|j 

i  n-l  n-l  /  i  \ 

=  ^R..R..,J,kf,  Vi?.  *"/ 

|  »-R,(„)]  (jj)  Rvk(l-Rv)‘'k 


n+1  RbiRb2  (^Rii) 

[j-R«<-.,J  X  k  (k)  RvM-Rv)1 


■  ♦I 

+  RBi  RB2  Rv  H  Ry 
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=  nil  rbi  RB2Rv  .n  RI1  -  ufj  R„  Rv  n  Ry 

+  ^+j  RBi  RB2RV  Rlj^  |^'“Rl(n*l)^ 

+  RBlRB2RV^.R  Rlj^ 


-  [■-  (>-  (?)«♦  (?)q’ 

-  (3)  Q**  )]/NQ 

*  /NQ 


where  the  first  term  was  simplified  using 
the  inductive  hypothesis  and  the  second  term 
was  simplified  in  the  same  fashion  as  the 
second  term  of  (B-6).  Algebraic  simplifi¬ 
cation  of  this  expression  leads  to 


R  = 


1 

n+1 


n  +  1 

RB1  RB2  RV  ^ 

i=  I 


n  r 


i=i 


■i 


■  ('-•¥«) 

N  -  I 

=“  (1-Q)  2 

N-l 

=  Ri  2  (B-9) 


which  is  (B-7)  with  N  =  n  +  1 . 

B.2.4  Simplification  for  Equal  Horizontal 
Deployment  Phase  Reliabilities 

Next,  we  make  the  assumption  that  R12  = 
R,3  =  .  .  .  =  R,n  -  R| ;  that  is,  that  the  hori¬ 
zontal  deployment  phase  reliability  is  equal 
between  each  pair  of  successive  deployments. 
In  this  case,  (B-6)  becomes 


U  RB 1  RB2  RV  (Rll  ^  RI1  n  R|) 

*  i=2  j  =  2 

-  ‘i^RBIRB2RV  j^Rll  R|  ^  ^  | 

=  RB|  RB2  RV  RIl  (1~R|N )/(  l“Rl  )  (B-8) 


When  R,  is  close  to  1 ,  which  must  be  the 
case  for  the  missile  to  be  at  all  reliable,  we 
can  simplify  (B-8)  further  with  the  aid  of 
a  pair  of  approximations.  Setting  Q  *  1-R, 
for  notations!  ease,  we  have 

(l-R,N)  /( 1  -R, )  -  (!-(I-Q)"]/NQ 


Figure  B-2  below  summarizes  the  ac¬ 
curacy  of  this  approximation  for  selected 
values  of  N  and  R,.  We  can  see  that  in  gen¬ 
eral  the  approximation  is  accurate  to  three  or 
more  significant  figures  for  realistic  values 
of  N  and  R,.  Substitution  of  (B-9)  into 
(B-8)  yields  the  approximation  for  the  ex¬ 
pected  percentage  of  successful  re-entry 
bodies 


N-l 

R  *  RBlRB2RVRl,RI  2  (B-10) 


Rl 

N 

l-RlN 

N<l-R|) 

N-l 

•> 

Rl  “ 

990 

8 

.9657 

.9654 

10 

.9652 

.9558 

12 

9468 

.9462 

14 

.9375 

9368 

995 

8 

9827 

9826 

10 

.9778 

.9777 

12 

.9730 

.9738 

14 

9681 

9679 

999 

8 

.9965 

.9965 

10 

.9955 

.9955 

12 

.9945 

.9945 

14 

.9935 

.9935 

Figure  B-2.  Accuracy  of  Approximation  for  Selected 
N  and  R| 
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B.2.S  Simplification  for  Constant  Failure 

Rate  for  Horizontal  Deployment  Phase 

The  nature  of  the  horizontal  deployment 
phase  makes  plausible  the  assumption  of  a 
constant  failure  rate  for  the  duration  of  the 
phase.  This  means  that  the  probability  the 
phase  will  not  fail  before  time  T+t,  given  that 
it  has  not  failed  before  time  T,  is  e-x*,  where 
X  is  the  failure  rate.  If  the  time  from  thrust 
termination  to  the  deployment  of  the  first 
re-entry  body  is  A, ,  and  the  time  between 
deployments  of  any  two  successive  re-entry 
bodies  is  A ,  we  are  led  to 

Rn  =  e-XA  1  and  R|  -  e"XA .  (B-ll) 

Substitution  of  (B-ll)  into  (B-10)  yields 
the  formula  used  for  the  expected  percentage 
of  successful  re-entty  bodies, 

-xrA(+2Li>r| 

R  “  RB1R*,Rve  L-  1  -I  .  (B-12) 


B.2.6  Calculation  of  Demonstrated  Flight 
Reliability 

To  calculate  Demonstrated  Flight  Relia¬ 
bility,  we  need  the  following  data  for  each 
flight  test: 


this  phase),  and  if  the  second  stage  motor 
failed,  then  the  first  boost  phase  would  be  a 
success,  the  second  boost  phase  a  failure,  and 
the  remaining  phases  all  no-tests. 

Let  SBI  and  FB|  be  the  number  of  first 
boost  phase  successes  and  failures  for  the 
flights  under  consideration.  SB,  and  Fb2  9X6 
defined  similarly  for  the  second  boost  phase, 
as  are  S,  and  F,  for  the  horizontal  deploy¬ 
ment  phase.  We  also  let  T,  be  the  total  op¬ 
erating  time  (time  until  failure)  for  those 
horizontal  deployment  phases  which  failed. 
Finally,  we  let  Sv  and  Fv  be  the  number 
of  vertical  deployment  phase  successes  and 
failures.  A  value  for  Sv  +  Fv  other  than  N  is 
possible  for  missiles  flown  in  other  than  the 
tactical  configuration.  These  numbers  are 
then  used  to  calculate  the  following  estimates. 

^bi  =  $bi  /(SB,  +FBl) 

Ry  =  Sy/(Sy  +  Fy  ) 

(B-13) 

R»  “  SBJ  /( SB2  +  FB2 ) 

X  -F,/  jS,[A,  +(N-1)A)  +T,| 

Substituting  the  estimates  (B-13)  into  for¬ 
mula  (B-12)  for  expected  percentage  of 
successful  re-entry  bodies  gives  the  formula 
defining  Demonstrated  Flight  Reliability: 


•  Whether  the  first  boost  phase  was  a 
success,  failure,  or  no-test. 

•  Whether  the  second  boost  phase  was  a 
success,  failure,  or  no-test. 


DFRstB1£B2£ve 


(W-i)A 
2  _ 


(B-14) 


•  Whether  the  horizontal  deployment 
phase  was  a  success,  failure,  or  no-test, 
and,  if  a  failure,  the  operating  time 
prior  to  failure. 

•  For  each  re-entry  body,  whether  its 
vertical  deployment  phase  was  a  suc¬ 
cess,  failure,  or  no-test. 

When  calculating  missile  body  DFR,  fail¬ 
ures  attributed  to  the  guidance  system,  de- 
struct  system,  or  re-entry  body  are  excluded 
from  consideration.  For  example,  if  the 
missile  went  off  course  during  the  first  boost 
phase  due  to  the  guidance  system  (but  other¬ 
wise  the  missile  functioned  correctly  during 


B.2.7  Example  and  Comparison  with 
Raw  Score 

Suppose  the  tactical  configuration  has 
N  «  8,  A,  =  1  minute,  and  A  =  2  minutes, 
and  assume  the  following  flight  results  (Fig¬ 
ure  B-3): 

These  results  yield  the  following  statistics: 

SB,  ■  8,  S,  *  6,  Fv  ■  3 

Sar  *  7,  FB1  -  1.  F,  -  1 

Sv  »  49,  Fm  -  1,  T,  -  12 
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first 

boost 

phase 

second 

boost 

phase 

horizontal 

deployment 

phase 

vertical 

deployment 

phase 

success 

success 

success 

8  successes 

success 

success 

success 

7  successes 

1  failure 

success 

success 

success 

6  successes 

failure 

no-test 

no-test 

4  no-tests 

success 

success 

success 

8  successes 

success 

success 

fsilure 
at  12  min. 

5  successes 

1  failure 

2  no-tests 

success 

success 

success 

7  successes 

1  failure 

success 

failure 

no-test 

4  no-tests 

success 

success 

success 

8  successes 

Figure  B-3.  Flight  Results 

Substitution  of  the  above  into  (B-13)  yields 
the  estimates: 


■Rbi  ■  8/(8+ 1)=. 889 
*BJ  =  7/(7+ 1)=.  875 
"Rv  =  49/(49+3)  *  .942 

£  *  1/|6  •  [1  +(8-1)21  +  12  j  -  .010 

Finally,  substituting  the  estimates  into  (B-14) 
gives  the  Demonstrated  Flight  Reliability 
for  the  given  data: 


..J  ,*&!£ 

DFR  =  (.889K  875)(.942)e  a  J 

=  .676 

By  comparison,  the  Raw  Score  (RS)  is  de¬ 
fined  as  the  ratio  of  the  number  of  success¬ 
ful  vertical  deployment  phases  to  the  total 
number  attempted.  For  this  example,  we 
have 

Rc  =  8  +  7  +  6  +  0  +  8  +  S  +  7  +  0  +  8 
8+8+6+4+8+8+8+4+8 

=  .790. 

In  this  case,  RS  is  higher  than  DFR.  Other 
examples  can  be  constructed  in  which  the 
reverse  is  true.  The  conclusion  is  that  it  is 
incorrect  to  attempt  to  interpret  the  Raw 
Score  as  a  measure  of  the  percentage  of 
re-entry  bodies  which  would  be  expected 
to  be  successful. 
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Appendix  C 

STATISTICAL  TESTS  OF  HYPOTHESES 


This  Appendix  provides  an  introduction  to 
some  methods  of  Statistical  Tests  of  Hypothe¬ 
ses  which  are  useful  in  Reliability  and  Avail¬ 
ability  assessment.  §  C.l  considers  the 
Kolmogorov-Smimov  goodness-of-fit-test,  § 
C.2,  the  x2  test  for  Compatibility  of  Bayesian 
Prior  and  Posterior  Estimates  in  the  Expo¬ 
nential  Model;  §  C.3,  Laplace's  test  for 
homogeneity;  and  §  C.4  the  “label  test"  for 
independence. 

Very  generally,  a  test  of  hypothesis  is  a 
procedure  for  deciding  whether  to  accept  or 
reject  some  assumption  or  hypothesis  about 
a  PDF,  CDF  or  its  parameters  on  the  basis  of 
available  test  data.  In  goodness-of-fit- tests, 
for  instance,  the  decision  is  to  accept  or 
reject  the  hypothesis  that  a  sample  origi¬ 
nates  from  a  preselected  CDF,  F(t).  A  very 
common  type  of  simple  hypothesis  test  is 
a  decision  rule  to  accept  or  reject  an  assumed 
value  0O  (the  null  hypothesis  H0)  of  the 
parameter  of  a  distribution  against  another 
specified  value  0,  (the  alternate  hypothesis 
H, ).  In  composite  hypotheses,  not  all  param¬ 
eters  of  a  frequency  function  are  specified. 
In  this  case,  for  a  two  parameter  distribution, 
an  example  would  be:  H0  (the  null  hypothe¬ 
sis),  o0  =  l,p0  =0;H,  (the  alternate  hypoth¬ 
esis),  o,  =  0.5,  p,  >0. 

Many  tests  can  be  constructed  for  given 
hypotheses  but,  just  like  estimation,  tests  are 
expected  to  display  certain  desirable  qualities, 
A  “best"  test  for  a  simple  hypothesis,  for 
instance,  is  one  which,  for  a  given  type  1 
error  (or  a,  the  probability  that  H0  is  true  but 
will  fall  in  the  critical  region  of  the  test  and 
be  rejected)  minimizes  the  type  II  error  (or 
0,  the  probability  that  H,  is  true  but  will  fall 
outside  the  critical  region  of  the  test  and  be 
rejected).  The  Neyman-Pearson  Lemma 
(I,  p.  2141  permits  the  construction  of 
best  tests  of  simple  hypothesis.  For  com¬ 
posite  hypothesis,  the  choice  of  a  best  test  is 
often  based  on  the  consideration  of  the 


power  function  P(0)  of  the  test  11,  p.  54) 
which  is  the  function  of  the  parameter  0 
that  gives  the  probability  that  the  sample 
point  will  fall  in  the  critical  region  of  the  test 
when  0  is  the  true  value  of  the  parameter. 

Since  P(0)  =1-0,  where  now  0  is  a  func¬ 
tion  of  0,  seeking  a  test  that  minimizes  the 
type  II  error  0  is  equivalent  to  seeking  one 
that  maximizes  the  power  function  P(0). 

C.l  THE  KOLMOGOROV-SMIRNOV 
GOODNESS-OF-FIT-TEST 


There  are  many  goodness-of-fit-tests,  with 
the  x2  [  1 ,  P-  347J  test  one  of  the  most  popu¬ 
lar.  The  x2  method,  however,  is  not  an  exact 
method  and  requires  the  classification  of  data 
into  cells  with  at  least  five  data  points  per 
cell.  Thus,  the  xz  test  requires,  fairly  large 
samples.  This  is  not  the  case  with  the 
Kolmogorov-Smimov  test  described  in  this 
appendix,  which  is  exact  and  is  applicable 
to  small  samples.  A  note  of  caution,  however, 
is  in  order.  The  Kolmogorov-Smimov  test 
may  not  always  provide  accurate  results 
where  small  samples  are  involved.  It  is  shown 
in  [21  specifically  that  at  least  40  samples 
are  required  to  distinguish  between  fairly 
close  members  of  the  Lognormal,  Gamma, 
and  Weibull  distributions  by  means  of  the 
Kolmogorov-Smimov  goodness-of-fit-test  at 
the  0. 1  level  of  significance. 

Let  *< i  >.  1(2).  *(3 >»•  •  •  t(»)  denote  an 
ordered  sample  of  size  x  from  a  population 
with  CDF  F(t),  and  let  S,(t)  denote  the 
empirical  distribution  function  of  t(l>, 
t, 2 .  .t,,  j,  defined  as  follows: 

l  0  «<«m 


S,(0  *  < 


*(k)  ^  *  <  *(k+l) 


I 
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Figure  C-l .  The  Empirical  CDF  and  the  Kolmogorov-Smimov  Limits  (dashed  lines) 


Thus,  Sx(t)  defines  a  functional  ladder  as 
depicted  in  figure  C-l  for  a  particular  sam¬ 
ple  of  size  10. 

If  F(t)  were  known,  then  it  would  be 
possible  to  calculate  the  value  of  I  F(t)  - 
S„(t)l  for  any  desired  value  of  t  and  thus 
to  determine  the  maximum  vertical  dis¬ 
tance  between  the  graphs  of  F(t)  and  Sx(t) 
over  the  range  of  possible  t  values.  Let  us 
denote  this  maximum  distance  as: 

D„  =  max  I  F(t)  -  Sx  (t)l 
t 

Kolmogorov  and  Smirnov  have  shown  that 
the  distribution  of  Dx  does  not  depend  on 
F(t),  and  have  proceeded  to  tabulate  critical 
values  Dj  of  Dx  as  shown  in  Figure  C-2. 

Often  F(t)  is  not  so  much  known  to  the 
data  analyst,  as  it  is  selected  as  a  candidate 
CDF  for  the  data  at  hand.  As  such,  F(t)  often 
includes  one  or  two  (more  rarely  three)  un¬ 
known  parameters,  say  9  and  7  and  is  re¬ 
labeled  F(t  )  or  F(t?0,7).  Such  parameters 
can  be  estimated  from  the  data  itself.  If,  for 
instance  F(t,0)  is  the  exponential,  that  is, 
F(t^)*  l-e«*,then, 

t(i)/x, 

I*  I  w 
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Figure  C-2.  Critical  Values  for  Dx  in  the  Kolmogorov- 
Smimov  Test 


and  the  “known”  CDF  becomes  the  estimat¬ 
ed  CDF  F(ti?)  *  I-e  *^.  For  such  estimated 
CDF’s,  the  Kolmogorov-Smimov  values  of 
D^  are  no  longer  strictly  valid,  but  depend 
on  the  functional  form  of  the  estimated  CDF. 

The  procedure,  then,  to  test  the  hypothesis 
of  goodnese-of-fit  of  a  hypothetical  distribu¬ 
tion  to  data  at  hand  is: 

a)  Select  a  significance  level,  o.  The  lower 
a  significance  level  (e.g.,  0.01)  the  more  likely 


C-2 
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a  good  fit  to  any  candidate  CDF  will  be  dem¬ 
onstrated.  The  higher  a  significance  level  (e.g., 
0.20),  the  less  likely  any  CDF  will  be  shown 
to  fit  the  data,  but  also  the  greater  the  likeli¬ 
hood  of  rejecting  a  particular  CDF  as  false 
when  it  is  actually  true. 

b)  Draw  the  empirical  CDF  Sx  (t). 

c)  Draw  the  values  of  Dj  about  Sx  (t)  to 
form  an  acceptance  region. 

d)  Draw  the  candidate  CDF  F(t),  F(t,?), 
or  F(t,2T;^),  if  the  appropriate  D^  has  been 
tabulated. 

e)  If  the  candidate  CDF  remains  within 
the  acceptance  region,  the  hypothesis  of  a 
good  fit  is  accepted  at  the  selected  critical 
value,  otherwise  it  is  rejected. 

As  an  example,  assume  that  for  the  empiri¬ 
cal  CDF  of  figure  C- 1 ,  one  shows  the  Kolmo- 
gorov-Smimov  limits  corresponding  to  a  = 
0.2  at  a  distance  of  0.32  (see  figure  C-2)  of 
Sl0(t).  These  limits  are  the  dashed  lines  on 
figure  C-I.  Assume  also  that  the  candidate 
CDF  for  the  fit  is  F(t)  =  l-e  '/«  =  l-e  '/s. 
In  this  expression,  0  =  5  is  not  estimated 
from  the  data.  It  is  assumed  to  be  known. 
F(t)  is  drawn  in  figure  C-l  and  seen  to  cross 
the  Kolmogorov-Smimov  acceptance  region. 
Thus,  F(t)  is  rejected  as  a  possible  CDF  for 
the  given  data  at  the  0.2  level  of  significance. 

The  two  examples  which  follow  are  useful 
in  reliability  assessment  where  it  is  desirable 
to  know  whether  times  to  failure  are  expon¬ 
ential,  and  in  availability  assessment  where 
times  to  repair  are  to  be  tested  for  lognor- 
mality. 

Example  of  Curve  Fining  the  Exponential 
Model 


The  Kolmogorov-Smimov  Goodness  of  Fit 
Test  specifically  applied  to  the  exponential 
by  LiUiefors  [3]  will  be  used  as  an  illustra¬ 
tion.  The  model  to  be  fitted  is  F(t)  =  1-e  *4' 
where  p  is  the  repair  rate. 

It  is  assumed  that  the  following  repair 
times  data  are  available. 


Me,  «  7.0  hrs. 
MCj  *  5.0  hrs. 
M(j  «  3.9  hrs. 
Me4  *  3.0  hrs. 
MCj  «  2.5  hrs. 


Mc6  *  2.3  hrs. 
MClJ  =  2.0  hrs. 
M*t  *  1.7  hrs. 
Mt9  =  1.0  hr. 
K10*  0.5  hr. 


The  times  have  been  ordered  in  preparation 
for  the  goodness  of  fit  test. 

An  appropriate  ML  repair  rate  estimate 
from  the  sample  is: 


i»n 

P  =  n  /  2  Mc 
i- 1  1 

where  n  =  10,  the  repair  sample  size. 

The  sample  data  is  plotted  as  a  cumulative 
distribution  of  observed  repair  times.  An  ex¬ 
pected  distribution  is  also  plotted  from  the 
relation 


P(MC)  =  1-e  *"« 


as  shown  in  Figure  C-3. 


MINUTES 


Figure  C-3.  Cumulative  Distribution  of  Repair  Times 
with  Kolmogorov-Smimov  Limits 
(Exponential) 


The  Kolmogorov-Smimov  test,  as  modified 
by  LiUiefors,  is  then  applied  to  test  the 
hypothesis  that  the  data  are  from  an  exponen¬ 
tial  distribution  with  mean  ft.  The  statistic 
evaluated  is  D,  the  largest  absolute  deviation 
between  the  observed  and  expected  ordinates 
of  the  cumulative  distribution. 
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Therefore 

D*  max  jiP(Mc)  -S(Mc)l| 

where 

M(  =  measured  repair  time 
P(Mc)s  the  computed  cumulative  fre¬ 
quency 

S(Mc)s  the  observed  cumulative  frequency 

The  sampling  distribution  of  D  is  known 
(figure  C- 4).  Given  a  sample  of  n  observed 
repair  times  and  a  significance  level  a,  the  re¬ 
sult  D  >  K(n,a)  supports  rejection  of  the 
hypothesis  with  confidence  (1-a);  otherwise 
the  hypothesis  is  not  rejected. 

From  our  given  repair  data,  the  hypothesis 
of  exponential  distribution  with  mean  2.89 
hours  is  not  rejected. 

Example  of  Curve  Fitting  Lognormal  (Nor¬ 
mal)  Data. 

Assume  ordered  observed  repair  times  in 
hours  are  as  follows: 


=  6.1 

M«  -  2.5 

6 

MCJ  =  4.6 

Mc,  *  2.2 

Mc3  *  3.5 

M.,  «  2.0 

Me4  *  3.0 

M«,  *  2.0 

MCj  =  3.0 

Mc10  *  0.6 

If  the  exponentiality 

goodness-of-fit  test 

of  the  previous  example  is  applied  to  the  data, 
the  null  exponentiality  hypothesis  would  be 
rejected  at  the  0.1  significance  level. 

This  is  seen  in  figure  C-5.  Thus,  it  becomes 
necessary  to  fit  a  lognormal  distribution  to 
the  observed  data.  First  the  data  points  are 
transformed  to  their  logarithms  (Xj  =  In  Mc.), 
then,  in  a  procedure  following  exactly  the 
previous  example,  but  using  0.1  level  of  sig¬ 
nificance  entries  from  figure  C-6  instead  of 
C-4,  one  sees  that  the  null  hypothesis  of  log- 
normality  is  accepted  at  that  level. 

C.2  TEST  OF  HYPOTHESIS  FOR 
COMPATIBILITY  OF  BAYESIAN 
PRIOR  AND  POSTERIOR  ESTIMATES 
IN  THE  EXPONENTIAL  MODEL 

In  the  case  of  exponential  data  with  gamma 
conjugate  prior  (see  S  S.4.6.8),  the  prior 


ABSOLUTE  VALUES  OF  THE  MAXIMUM 
DIFFERENCE  D  BETWEEN  SAMPLE  AND 
POPULATION  CUMULATIVE  FRACTIONS 
SIGNIFICANT  AT  THE  20,  tS,  10.  S  AND  1 
PERCENT  LEVELS 
n  ”  sample  ii» 


Sample  Size 

Laval  of  Significance  a  j 

n 
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Figure  C-4.  Kobnogorov-Smimov  Limit  Factors 
K  (nA)  for  Exponential  Distribution 
with  Estimated  Mean 


estimate  of  X  is  X0  *  d/r  where  d  ere  pseudo¬ 
failures,  and  r  are  pseudo  test  hours.  The 
estimate  of  X  is  X  *  x/T,  where  x  are  actual 
failures  and  T  is  actual  test  time.  The  prob¬ 
lem  is  to  construe^  a  test-of-hypothesis  to 
accept  or  reject  H0  iX0  =  d/r.  The  hypothesis 
test,  based  on  the  distribution  of  X  for  ex¬ 
ponential  data  (See  Appendix  D  9  D.l)  is 
as  follows: 
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M,  (HRS. I 

Figure  C-5.  Cumulative  Distribution  of  Repair  Times 
with  Kolmogorov-Smirnov  Limits 


<  x  <±JL^JLg./^)  (C.2) 

where  the  XJ 's  are  the  chi-square  values  at 
the  (a/2)  100 %  and  (l-a/2)  100%  percentiles 
with  2x  and  2x+2  degrees  of  freedom,  re¬ 
spectively. 

The  decision  rule  is.  if  T0  is  included  with¬ 
in  the  confidence  interval  given  by  inequality 
(equation  C-2),  accept  the  hypothesis  that  X0 
is  compatible  with  the  observed  test  data.  If 
X0  is  outside  the  confidence  interval,  reject 
the  hypothesis  that  X0  is  compatible  with  the 
data  and  consider  an  appropriate  non-Bayes- 
ian  reliability  assessment  model. 


C.3  LAPLACE  S  TEST  OF  HOMOGENEITY 

In  a  homogeneous  poisson  process  (HPP), 
successive  times.  T^,  of  failure  of  the  HPP  are 
identical  independent  uniform  random  vari¬ 
ables  (r.v.)  over  the  interval  (0,  t0)  (4J. 
Denoting  any  of  the  T,  variables  as  T,  f(T)  = 
1  /t„ ,  then  the  expected  value  is: 


E(T)  =  j‘°  (T/t„)dT- t0/2 

O 

also 


=  tjVTz 


Using  the  Central  Limit  Theorem,  and 
theorems  on  the  calculation  of  moments  of 
r.v.'s  which  are  the  sum  of  x  identical  in¬ 
dependently  distributed  r.v.’s,  then 


v  =  £  T( 

i=  I 

is  approximately  normally  distributed  e.g. 
x  >  3,  with  expectation  xtc/2  and  standard 
deviation  t0  A/TTx,  and 

«-  T./cxt j-^yy/nr 
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ABSOLUTE  VALUES  OF  THE  MAXIMUM 
DIFFERENCE  BETWEEN  SAMPLE  AND 
POPULATION  CUMULATIVE  FRACTIONS 
SIGNIFICANT  AT  THE  20,  15, 10,  5  AND 
1  PERCENT  LEVELS 


Sample  Size 

Level  of  Significance 

n 

0.20 

0.15 

0.10 

0.05 

0.01 

4 

0.300 

0.319 

0.352 

0.381 

0.417 

5 

0.285 

0.299 

0.315 

0.337 

0.405 

6 

0.265 

0.277 
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0.319 
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7 
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0.224 
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0.275 
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0.250 

17 

0.169 

0.177 
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0.200 
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0.235 

20 

0.160 
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0.174 

0.190 

0.231 

25 

0.142 

0.147 

0.158 

0.173 

0.200 

30 

0.131 

0.136 

0.144 

0.161 

0.181 

Over  30 

0.736 
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1.031 

Vn 

Vn 

Vn 

Fipirt  C-6.  Kolmogorov  Limit  Factors  K(n  a)  for  the  Normal  Distribution  with  Estimated  Mean  and  Variance 
(Lilliefors) 
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is  approximately  normally  distributed  with 
mean  0  and  standard  deviation  1.  If  the 
HPP  is  selected  as  a  null  hypothesis,  then  u 
can  be  selected  w  a  statistic  to  test  the 
hypothesis  of  homogeneity.  The  test  is  best 
used  by  considering  two-sided  critical  values 
on  the  normal  curve  since  under  wearout  or 
growth  the  T,  will  tend  to  occur  after  or  be¬ 
fore,  respectively,  the  midpoint  of  the  ob¬ 
served  interval.  In  other  words  [4],  signifi¬ 
cantly  large  or  small  values  of  the  standard¬ 
ized  variate  u  show  significant  evidence  of 
wearout  or  growth,  respectively.  This  test 
has  been  shown  to  be  an  optimum  test  against 
two  plausible  models  by  Bates  [S]  and  Cox 
16).  As  stated  in  [4)  Laplace’s  Test  is  not 
consistent  against  alternatives  where  the  rate 
of  occurrence  of  failure  is  non  monotone  in 
such  a  way  that  the  expected  value  E  is: 

E  (I,  -7 

In  this  case  a  test  developed  by  Hollander  and 
Proschan  [7]  is  superior.  ' 
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Appendix  D 
DERIVATIONS 


This  appendix  contains: 

D.i  Derivation  of  Availability  Confidence 
Limit  Formulae  for  exponential  failure 
and  recovery  times,  and  for  exponential 
failure  times  and  lognormal  recovery 
times. 

D.2  Derivation  of  the  Poisson  Process  of  the 
Exponential  Reliability  Law. 

D.3  Introduction  to  Birth  and  Death  Proc¬ 
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D.I  DERIVATION  OF  AVAILABILITY 
CONFIDENCE  LIMIT  FORMULAE 

The  two  cases  considered  under  this  head¬ 
ing  are  (a)  failure  and  recovery  times  are 
both  exponential,  and  (b)  failure  times  are 
exponential  but  recovery  times  are  lognormal. 

D.I  .1  Confidence  Limit  for  Exponential 
Failure  and  Recovery  Times!  1  ] . 

For  exponential  failure  and  recovery  times 
the  failure  PDF  of  operating  times  between 
failures  (t)  for  an  item  is  f(t)  =  Ae*M  and  the 
PDF  of  times  to  repair  (Mc)  is  f(Mc)  = 
/ie'*‘Mc.  In  the  exponential  model,  the  in¬ 
stantaneous  failure  rate  X  and  the  instanta¬ 
neous  repair  rate  p  are  constant. 

D.I  .1 .1  Test  Truncated  by  Failure 

In  the  test  truncated  by  failures  case,  the 
number  of  failures  x  is  decided  in  advance. 
Several  Scenarios  are  possible. 

(a)  An  item  may  be  placed  on  test  and  as 
soon  as  it  fails  it  is  replaced  with  a  new  or 
good  as  new  item,  and  so  on  until  x  failures 


are  recorded.  The  operating  times  between 
the  (i-l)th  and  the  ith  failure  are  denoted  by 

V 

The  ML  estimate  T  of  A  is  obtained  from 
the  likelihood  function  of  the  observations: 

X 

L  =  f(t, )  f(t2) . f(t„)  =  X*  e  1  *’ . 

Differentiating  with  respect  to  X  and  setting 
to  0  yields 

t-x/I  ti  (D-l) 

i=i 

(b)  N  identical  items  can  be  put  on  test 
without  replacement  until  x  of  them  fail. 
If  the  t/s  represent  the  times  t*  between  the 
(i-l)th  and  the  ith  failure  multiplied  by  the 
number  of  operating  items  between  the 
(i-l)th  and  the  ith  failure,  that  is  multipled  by 
N+l-i,  thenT  is  given  either  by 

T=x/£  (N+i-i)  t; 

i=  1 

or  T  =  x/2  t,.  (D-l) 

i=  1 

These  formulas  are  consistent  with  the  for¬ 
mulae  given  in  section  5 

t=x/^I  t,  +  (N-x)t,j 

where  tj  represents  the  time  of  each  item  on 
test  since  the  beginning  of  testing  and  t,  the 
time  to  failure  of  the  last  item  to  fail. 

(c)  N  identical  items  can  be  put  on  test 
with  replacement  until  x  failures  are  observed. 


D-l 


T 
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If  the  t,’s  now  represent  the  times  t*  be¬ 
tween  the  (i-l)th  and  the  ith  failure  mul¬ 
tiplied  by  the  number  of  operating  items 
between  the  (i-l)th  and  the  ith  failure,  that 
is  multiplied  by  N,  then is  given  either  by: 


gxtf)  =  x»X*^ 


exp  l-Xx/T)/(x-l)!  . 


The  variable  of  interest  in  finding  con¬ 
fidence  limits  on  availability  in  §  D.  1.1. 3  is 


T  =  x/ 


V  =  2xX/5torV=2X2  t. 

i-i 


or  by 


T=x/X  t,. 

i  =  1 


(D-l) 


Using  kx(u)  or  gx  (R),  one  finds  that  the  PDF 
of  V  is: 


This  is  again  consistent  with  the  formula  from 
Section  S 

X  =  x/Nf 


hx(V)  =  V*-«  exp  (-V/2)/[2*  (x-l)!]  (D-2) 

which  is  a  chi-square  (xJ )  PDF  with  2x  de¬ 
grees  of  freedom. 


where  tx  represents  the  time  on  test  of  the 
item  failing  last. 

In  all  cases,  under  the  appropriate  defini¬ 
tion  of  tj, 

T=x  £  t,  (D-l) 

•  =1 

In  order  to  find  the  PDF  ofT,  one  may 
first  consider  the  PDF  of  the  variable 

T  *  £  t, 

1=1 

which,  as  the  sum  of  x  identically  distributed 
exponential  r.v.’s  of  the  form  f(t,)  =  Xe'Xli,  is 
gamma: 

7x(T)  =  T*'1  X*  exp  (— XT)/(x- 1 )!  . 

The  transformation 

x  1 

u  =  2  tj/x  =  T/x  =;=;yields: 

i"  1  A 


D.  1 .1 .2  Test  Truncated  by  Time 

In  the  case  of  time  truncated  tests,  the 
total  time  on  test  T  is  decided  in  advance  of 
testing.  The  test  scenarios  of  §  D.  1.1.1  still 
apply  provided  the  estimator  of  X  used  is 

T  =  x/T  instead  of  x/  2  t,. 

i=i 

In  the  expression  X  =  x/T,  x  represents  the 
number  of  failures  occurring  in  T 

where  T=£t,  +  t*  , 

i*i 

and  tj  represents  the  operating  time  between 
the  (i-l)th  and  the  ith  failure,  and  tx+,  the 
operating  time,  not  necessarily  ending  in 
failure,  between  the  xth  failure  and  the  end  of 
test  time  T. 

In  §  D.  1.1.1  the  distribution  of 
V  =  2X  2  t,orV*2xXA‘ 

i*i 


. x*xu)x-'  X*  exp(-Xxu) 

kx  (u)  =  x7x  (xu)  = - - 

and  since  T=—  ,  and 


g»(£)  =  kx  u(T) 


du(£) 


& 


,  then 


was  derived.  This  distribution  could  be  readily 
obtained  under  the  assumption  of  times  end¬ 
ing  in  failure.  It  is  not  possible  to  use  a 
parallel  argument  for  V  =  2XT  in  the  case  of 
tests  truncated  by  time  since  both  X  and  T  are 
constant,  and  V  is  not  a  random  variable  in 
this  case. 

For  tests  truncated  by  time,  x,  the  number 
of  failures  is  actually  the  random  variable  of 
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interest  One  can,  nevertheless,  arrive  at  a 
reasonable  treatment  of  XT  as  a  random  var¬ 
iable  if  one  accepts  the  Bayesian  argument 
that  the  value  of  the  probability  density  func¬ 
tion  for  XT  is  proportional  to  the  probability 
of  observing  N  failures  under  that  value  for 
XT. 

For  example,  suppose  1  failure  was  ob¬ 
served. 

2'e*1 

For  XT  =  2,  P(x)  =  P(l)  =  — p  =  2e-1. 

3‘e-3 

For  XT  =  3 ,  P(x)  =  P(  1 )  = -y- =  3e* 3 . 

Therefore,  the  ratio  of  the  ordinates  of  the 
PDF  for  XT,  which  we  will  call  f(XT)  is  given 
by  the  equation 

f(2)  _  2e-3  2e 
f(3)~3c-3  "  3 


or  XT  is— 

3 

times  as  likely  to  be  in  a  small  neighborhood 
near  2  as  it  is  to  be  in  a  same  size  neighbor¬ 
hood  near  3.  If  this  reasoning  is  carried  out 
for  all  values  of  XT  then  for  one  failure: 

u1  e~u 

f(XT)  =  f(u) - - — 

and  for  any  number  of  failures,  x, 

uxe"u 

f(XT)  =  f(u)  -  . 


Since  /“  f(u)du  =  1 ,  then 


u*e‘u/x! 

f(U)=  - 

u*e-“du/x! 


=  u  *e-“/x! 


If  we  substitute  u  *  2XT  for  XT,  we  find: 

f(2Xt)  =  f(u)  =  uxe‘u/2/(x!2x+1 )  (D-3) 

which  shows  that  2XT  is  x 2  with  v  =  2x+2 
degrees  of  freedom.  Also,  since 

T  =  2 xX/£=  2XT 


is  also  distributed  as  x2  with  2x+2  degrees  of 
freedom. 

D.l  .1 .3  Confidence  Limit  on  Availability 
for  Exponential  Failure  and 
Exponential  Repair  Times 

For  exponential  repair  times,  f(Mc)  * 
p  e-**M  c  where  Mc  is  repair  time  and  p  is  the 
constant  instantaneous  repair  rate. 

The  maximum  likelihood  estimate  of  p  is 


ml  I  M 

i=l  * 

where  m  is  the  number  of  repairs  observed 
and  Mc  is  the  time  required  for  the  ith  repair. 
The  density  function  of  p~  is  obtained  by  a 
procedure  identical  to  that  for  the  density  of 

(1  \m-l 

—  1  rnm  exp  (-mu/p). 


Let  the  random  variable  v 


_  2m p 


It  is  assumed  that  each  repair  time  Mc.  is 
terminated  by  a  repair.  Then: 


g(v)  = 


2m(m-l)! 


(D-4) 


is  a  x2  density  with  2m  degrees  of  freedom. 

For  failure  truncated  tests  it  was  shown  in 
§  D.l.  1.1  that  u  =  2 xX/£  is  distributed  asx2 
with  2x  degrees  of  freedom  for  tests  trun¬ 
cated  by  failure.  Since  u  and  v  are  independ¬ 
ently  distributed  with  2x  degrees  of  freedom, 
the  quantity 

u/2x 

v/2m 

has  the  variance  density  ratio  F  with  2x  and 
2m  degrees  of  freedom.  But 

u/2x  .X/^X/p 
v/2m  p/£T  T/J? 
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and  since  the  ith  component  availability  is 
**  i  ♦<X/ji)> 

a  100  (1-y)  percent  lower  confidence  bound 
on  A,  is  obtained  where  testing  is  terminated 
by  failure  from  the  relation: 

/X\  (D-5) 

Vm/*  *jrF|-^s  *-• 

F,  lm  is  the  ( 1-y)  fractile  of  the  cumula¬ 
tive  £  distribution  with  2x  and  2m  degrees  of 
freedom.  It  is  tabulated  in  Appendix  E,  figure 
E-6  which  gives  F ^  f,  ,f2  for  f,  =  1(1)  80, 
and  f,  =  1(1)80. 

For  tests  truncated  by  time,  it  was  shown 
in  §  D.l.1.2  that  under  a  particular  assump¬ 
tion  2xX/£  was  distributed  as  x2  u+2  •  this 
case  the  100(1 -7)%  lower  availability  con¬ 
fidence  limit  is  obtained  by  using 


in 


since 


(x+Dt 


n(y;*,o2)! 


-H?y 


Oy/TIT 


The  cumulative  lognormal  distribution  is  de¬ 
noted  A  (z;  |i,  a3 ).  An  expression  for  it  does 
not  exist  in  terms  of  elementary  functions  but 
by  the  simple  transformation  y  *  In  z  it  is 
transformed  into  the  normal  integral 

A  (z;p,  o2)  =  N  (In  z;p,  o2) 

and  by  a  further  simple  transformation  to  the 
standard  normal  variate 


x  =  - 


y-M 


Cumulative  probabilities  of  the  lognormal  can 
be  found  numerically  from  tabulated  values 
of  the  standard  normal  N  (x;0,  1). 


(D-6) 

\  0  / 

The  parameters  (i  and  0  can  be  estimated 
from  lognormal  data  by  means  of  the  trans¬ 

(D-7) 

formation 

n 

Z  In  Zj 

It -  hTI  -  - 

case  x  -  0 

VZ  (In  z,  -  In 

1=1 _ 

JTl 


z)2 


is  indeterminate,  since  X  =  0.  One  can  use: 

A  \  x+l  (D-8) 

W.  "  m 

D.1.2  Estimation  of  the  Parameters  of  a 
Lognormal  Distribution  (2) . 


D.l  .3  Confidence  Limit  for  Exponential 
Failure  Times  and  Lognormal 
Recovery  Times 

Gray  and  Lewis  (3]  have  shown  that  if  a3 
(the  variance  of  In  Mc)  is  assumed  to  be 
known,  then  for  a  random  sample  of  m  repair 
times 


A  random  variable  zX)  has  a  lognormal  dis¬ 
tribution  if  it  has  the  density  function 


X(z;u,  o2) 


1 


Oy/’ZH  Z 


exp 


Then  its  logarithm  y  *  In  z  has  a  normal  dis¬ 
tribution 


where  XL.  is  the  geometric  mean  and  7  is 
1/Mc^.  For  purposes  of  analysis  o2  is  as¬ 
sumed  to  equal  the  variance  estimated  from 
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the  sample.  For  exponentially  distributed 
failure  times 


2xX 

* 


X1  (2x). 


Then  a  confidence  interval  for  X/a  can  be 
found  by  finding  the  distribution  of 


axx/Kya/zre")^^. 


then 

PlW<al 


P  *  ^  a  exp  (a1  /2)~1 

M  Jt  2x  J 


l-o. 


Then  a  one-sided  100  (1  -  a)  percent  con¬ 
fidence  limit  for  A,  is 


(D-9) 


Figure  E-7  gives  selected  values  of  the  coeffi¬ 
cient 


In  general  then,  letting  U  - 


V  ~  A 


XJ  (k)  and 


f(u,v) 


k 


II 

'e '*  sfrtml2  <(*" 

0  ^  v 


0<u<°° 

0<vO> 


m  £> 

ai^-rr  .x,for 


Jn i 

•  =  — —ff  p  -—  la*  *-i  w*L>*-« 

L2^  2  J 


didw 


.20. 


(D-10) 


Let  W  =  U/V  and  Z  =  V ;  then 


elsewhere 


D.2  DERIVATION  OF  THE  POISSON 
PROCESS  AND  OF  THE 
EXPONENTIAL  RELIABILITY  LAW 


r(ir)2k'!'/*r 


Hence  P[a<W<b]  isgivenby: 

P  [a  <W  <bl  -  £  g(w)  dw  *  p. 


D.2.1  Assumptions 

The  assumptions  underlying  the  Poisson 
failure  process  g(x,t)  were  presented  in 
§  5.1.1  and  will  not  be  repeated  here. 

D.2.2  Derivation  of  the  Poisson  Failure 
Process 

From  the  assumption,  the  probability  of 
occurrence  of  0  failure-inducing  shocks  prior 
to  the  time  t+At  is  given  by: 

g(0,t+At)  =  g(0,t)  ( 1-XAt).  (D-ll) 

The  probability  of  x  failure-inducing 
shocks  prior  to  the  time  t+At  is  given  by: 


Let 


Ax.t+At) »  Kx.t)  (1-kAt)  +  Ax-l.t)  (XAt).  x>  0.  (D-l  2) 


2v'e*‘xX 

w-— 


From  equations  (D-ll)  and  (D-12).  the 
following  differential  equations  arise: 
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g'  (0,t)  «  -Xg(0,t) 

g'  (x,t)  ■  X[g(x-l,t)  -  g(x,t)] ,  x  >  0 
with  g(0,0)  *  1 ,  and  g(x,0)  *  0  for  x  >  0. 

The  solution  to  this  set  of  linear  first-order 
differential  equations  is: 

g(x,t)  *  (Xt)*  e-x‘/x!,  x  -  0, 1,  2, . . . 

D.2.3  The  Exponential  Reliability  Law 

By  definition,  R(t)  =  g(0,t)  ■  e_Xt.  The 
corresponding  PDF  is 

«t)  =  --i(e-x,)  =  Xe-x‘. 
dt 

D.3  INTRODUCTION  TO  BIRTH  AND 
DEATH  PROCESSES 

The  reliability  or  availability  of  many  sys¬ 
tems,  particularly  redundant  and  repairable 
systems,  may  be  derived  by  considering  Birth 
and  Death  processes  as  expounded  initially  in 
[4]  and  comprehensively  developed  for 
RMA  application  in  15].  A  single  element 
is  considered  first,  then,  more  complex 
structures  are  treated. 

D.3.1  Reliability  of  Non-Repairable  Dement 

Figure  D-l  shows  an  example  of  a  death 
only  process  as  applied  to  a  single  non¬ 
repayable  element.  The  element  can  only  be 
in  two  states,  up  (U)  or  down  (D).  The  hazard 
rate  X  is  assumed  to  be  constant  and  is  shown 
directed  toward  the  absorbing  state  D.  D  is 
termed  an  absorbing  state  because  once  the 
element  has  reached  this  state,  it  cannot  leave 
it  and  remains  there  with  probability  1.0.  A 
graph  such  as  is  shown  in  figure  D-l,  called 
a  Markov  graph,  depicts  the  states  and  transi¬ 
tion  probabilities  between  them. 


1 


Figure  D-l .  Markov  Graph  for  a  Single 
Non-Repairable  Element 


A  state  transition  matrix  can  be  formed 
from  figure  D-l .  It  is  shown  in  figure  D-2. 


Figure  D-2.  State  Transition  Matrix  for  One  Element 


In  the  state  transition  matrix,  the  entry  XAt, 
for  instance,  represents  the  probability  of 
going  from  state  U  at  time  t  to  state  D  at  time 
t+At. 

A  set  of  difference  equations  are  readily 
written  for  the  graph  of  figure  D-l,  or  for  the 
state  transition  matrix  of  figure  D-2. 


Pu(t +  At)  =  (1— XAt)  Pu(t)  (D-l  3) 

PD(t  +  At)  =  PD(t)  +  XAt  Pu  (t)  (D-l 4) 

In  the  equations  above,  Py  and  PD  stand  for 
the  probability  of  states  U  and  D,  respectively. 

Under  the  assumption  that  the  unit  is  up  at 
time  0, 


?v(0)=  1,PD(0)  =  0. 


Rewriting  (D-l 3)  and  (D-14)  as: 


PuO+AD-PuU) 

-  + 

At 


XPu(t)  =  0 


.  \Py„) 


(D-1S) 

(D-16) 


Letting  At-*0,  the  following  differential  equa¬ 
tions  are  obtained: 


Pu  (t)  +  XPy  (t)  -  0  ,  P„  (0)  -  1  (D-l 7) 
PD(t)-XP0(t)  ,PD(0)-0.  (D-l  8) 
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Solving  for  Py(t)  in  (D-17): 

(O/Pu (0  *  -X,  In  (P„  (t))  =  -X 

ln(Pu(t))  =  Xt  +  C, 

Py  (t)  =  Ce_x‘,  and  the  condition  PU(0)  =  1=C 
leads  to  P0(t)  =  e‘x*. 


Figure  D-4.  State  Transition  Matrix  for  a  Repairable 
Element 


Similarly,  from  equation  (D-18): 
Pb<t)  =  XPu(t)  =  Xe-x‘ 

PD(t)=  /  Xe*x«  dt  =  -  e  x‘  +  C 
and  since 

PD(0)  =  0,0  =  -e'X(0)+C,  C=  1 
then: 


The  difference  equations  for  the  states 
depicted  in  figure  D-3  and  D-4  are: 

Pu  (t+At)  =  ( l-XAt)Py  (t)  +  MAt  PD  (t)  (D-l  9) 

P„(t+At)  =  XAt  Py  (t)  +  (1-pAt)  PD  (t)(D-20) 
with 

Py(0)=  1 ,  PD  (0)  =  0  as  in  §  D.3.1. 


PD(t)  =  l-ex‘. 


Equations  (D-l 9)  and  (D-20)  lead  to: 


D.3.2  Availability  of  a  Repairable  Element 

Figure  D-3  shows  a  birth  and  death  process 
as  applied  to  a  repairable  element.  Again,  the 
element  can  only  exist  in  two  states,  up  (U) 
or  down  (D).  lire  hazard  rate  X  and  repair 
rate  jr  =  (MTTR)'1  are  assumed  to  be  con¬ 
stant.  State  D  is  no  longer  absorbing  but  is  a 
reflecting  barrier,  because  it  is  possible  for 
the  element  to  go  back  to  state  U  by  repair. 

1-fiAt 

I 

fiAt 

Figure  D-3.  Markov  Graph  for  a  Repairable 
Single  Element 

The  corresponding  State  Transition  matrix  is 
depicted  in  figure  D-4. 


1-XAt 


XAt 


Py(t+At)-Py(t) 

At 


-XPy(t)+#lPD(t) 


PD(t+At)-PD(t) 

At 


XPy(t)-#lPD(t) 


and,  letting  At-*0,  we  have  the  differential 
equations: 

Pjj(t)+XPy(t)  «pPD(t)  (D-21) 

PD(t)  +  nPD(t)*XPy(t)  (D-22) 

Pu(0)=l.  PD(0)  =  0. 

Equations  (D-21)  and  (D-22)  can  be  solved  a 
number  of  different  ways,  but,  choosing  the 
Laplace  Transform  method,  gives 

SUPy)-l+XUPy)»ML(PD) 

sUPD)  +  pUPD)-XL(Py). 
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Solving  for  L  (Pu ): 


S  +  M  U  X 

LCPu)- - —  -  ——  ♦  - 

sJ  +  s(/i+X)  Oi+X)  s  Oi+X)  (s+ji+X) 


AW-Py  -L-»  (UPV))*^  +  —  e-O^)*  • 

This  expression  for  A(t)  is  identical  with  an 
expression  obtained  earlier  by  a  different 
method. 

D.3.3  Mean  Time  Between  Failure  for  a 
Repairable  Redundant  System 

D.3.3. 1  Unrestricted  Repair 

Consider  a  parallel  active  redundant  system 
consisting  of  six  operating  elements,  four  of 
which  must  be  “up’*  for  the  structure  to  be 
operational  (see  figure  D-5).  The  exponential 
law  applies  to  failures  and  repairs,  that  is  the 
failure  rate  X  and  repair  rate  u  are  constant. 
Repairs  are  unrestricted,  in  the  sense  that 
there  are  as  many  repairmen  as  needed  to  re¬ 
pair  failed  elements.  Each  repairman  works 
on  a  single  element  and  repairs  it  with  rate  jx. 

Figure  D-6  shows  a  Markov  Graph  for  the 
structure. 

Four  states  are  shown  in  figure  D-6 ;  states 
6,  5,  4,  and  0.  State  Six  is  the  state:  “Six 
elements  are  up,”  state  5  is  the  state  “Five 
elements  are  up,”  state  4  is  the  state  “Four 
elements  are  up,”  and  state  0  is  the  state 
“Fewer  than  four  elements  are  up.”  States 
6,  S,  and  4  are  system  success  states,  state  0 
is  the  system  failure  state.  The  2ft  shown  be¬ 
tween  states  4  and  5  reflects  the  fact  that  2 
repairmen  are  available  to  work  on  the  2 
failed  elements. 

If  only  states  4  and  0  existed,  then  one 
would  be  able  to  write  the  following  dif¬ 
ference  equation: 

R4(t+At)  =  R4(t)- 4XR4(t)At. 

Letting  At-*0,  this  leads  to 

R4(t)  +  4XR4(t)«0,R4(0)«  1 


Figure  D-S.  4  of  6  Parallel  Redundant  System 


Figure  1X6.  Markov  Graph  for  4  of  6  Repairable 
System  with  Unrestricted  Repair 

with  the  solution  R4(t)  *  e*4X‘, 

and  MTTF  =  JT  e«*‘dt  =  -?r . 


Assume,  however,  that  R4  is  in  a  steady-state. 
Then  R4  is  a  constant  and  the  failure  rate 
becomes 

4XR4  with  MTTF  (instantaneous)  =  *  - . 

(D-23) 


i 
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R4  is  actually  unknown  but  can  be  calculated 
under  steady-state  conditions,  as  follows: 

At  state  6,  R6  (6X)  =  Rs  Oi).  (D-24) 

At  state  5,  Rs  (5X)  =  R4  (2p).  (D-25) 

Also  R0  *  0  (since  this  is  the  failed  state) 


R4  +Rs  +Rt  =  1. 


D. 3.3.2  Restricted  Repair 

Under  restricted  repair,  only  a  single  failed 
element  can  be  worked  on  at  a  time.  Figure 
D-7  shows  a  Markov  Graph  for  the  system. 


n  u 


D .3.3.3  4  of  6  Repairable  System-Transient 
Solution 

Thus  far  the  treatment  of  the  4  of  6  repair¬ 
able  system  has  assumed  that  steady  state 
conditions  had  been  attained.  This  means,  in 
particular,  that  the  MTBF  calculated  was  the 
MTBF  of  the  structure  after  the  1st  failure 
and  after  subsequent  failures  (when  only  4 
elements  need  be  up  at  the  start  of  opera¬ 
tion),  not  the  mean  time  to  first  failure,  or 
MTTF  (when  6  elements  are  up  at  the  start 
of  operation). 

Although  it  is  still  possible  to  derive  the 
MTTF  starting  with  6  elements  up,  by  a 
method  similar  to  the  one  already  shown,  a 
full  Markov  formulation  is  presented  below 
for  illustration.  We  will  consider,  however, 
only  the  restricted  repair  case. 

From  the  transition  matrix  of  Figure  D-8,  the 
following  difference  equations  can  be  written. 


Figure  D-8.  Markov  Graph  for  4  of  6  Repairable 
System  with  Restricted  Repair 


Figure  D-7.  Markov  Graph  for  4  of  6  Repairable 
System  with  Restricted  Repair 


To  calculate  the  MTBF, 

Equations  (D-24)  and  D-25)  are  rewritten  as: 

R4  (6X)  *  R5  OO. 

R,  (5X)  *  R4  Oi). 


P6(t+At)  *  (l-6\At)P6(t)  +  MAtPj(t) 

P5(t+Al)  *  (l-<SX+|»)At)Pj(t)  *  (6XAt)P4(t)  ♦  *iAtP4(t) 
P4(t+At)  *  <l-(4A+*i)At)P4(t)  ♦  (SXAt)Pj(t)  +  *iAlP3(t) 
Pj(t+At)  *  (l-OXtM)At)P3(t)  ♦  (4*At)P4(t) 

P0(t+At)  «  P0(t)  ♦  3\AtP3(l) 

The  initial  conditions  are:  P6(0)  =  1,  P5(0)  = 
P4(0)  *  P3(0)  *  Po(0)  *  0.  The  difference 
equations  lead  to  die  following  differential 
equations: 

p4  ♦  6ap4  *  cP,  P4<0)  *  i  (D-26) 

Pj  ♦  (5Mn)Pj  •  6xf4  Pj(0)  -  0  (D-27) 
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p4  *  (4\*m)p4  «  sxps + mPj  p4(°)  •  o  (D-28) 

Pi  ♦  (3x+m)P,  *  4XP4  Pj(0)  -  0  (D-29) 

p,;-3xPj  Po(0)«0.  (D-30) 

Using  Laplace  transforms  to  solve  (D-26) 
-  (D-30),  we  have: 

*l<p6)  ♦  i  ♦  6xup6)  »  *iUPj)  (D-3 1 ) 

»L(Pj)  ♦  (Sx+^UPj)  +  6xup6)  +  «iMP4)  (D-3  2) 
*L<P4)  ♦  (4x*m)L(P4)  -  sxups)  ♦  mMP3)  (D-3 3 ) 

*L(P3)  ♦  (3x+m)WPj)  *  4XL(P4)  (D-34) 

*L(?0)  -  3xUPj)  (D-3  5) 

It  is  now  a  simple  matter  to  solve  algebra¬ 
ically  for  L(P6),  UP,),  L(P4),  L(P3),  and 
L(P0),  and  to  find  the  inverse  Laplace  trans¬ 
forms  P4,  P5,  P4,  P3,  and  P0. 

The  reliability  of  the  structure  at  time  t  is 
then  given  by: 

R(t)=  P4(t)  +  Pj(t)  +  P4(t) 

while  the  mean  time  to  first  failure  or  MTTF 
is  given  by 


Figure  D-9.  1  of  3  Structure:  One  Operating  and 
Two  Standby  Elements 


\  aim 

\  i«4i 
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(X^lAt 

1-lX*Xo»At 

0 

3 

0 

0 

(X«2Xp)  i 

l-4A«2Xo>Ai 

Figure  D-10.  State  Transition  Matrix  for  1  of  3 
System  with  Dormant  Hazard  Rate 


;  R(t)dt 

D.3.4  Reliability  of  a  1  of  3  Standby  System 
with  a  Dormant  Hazard  Rate  and  No 
Repair 

In  standby  redundant  systems,  standby 
units  may  have  a  positive  hazard  rate  XD 
before  they  replace  a  failed  operating  el¬ 
ement,  and  an  operating  hazard  rate  X  >  XD 
after  they  replace  a  failed  element.  Such 
standby  elements  are  said  to  display  a  dor¬ 
mant  hazard  rate. 

Consider  a  1  of  3  standby  parallel  system 
with  perfect  switching,  such  as  depicted  in 
figure  D-9. 

Defining  state  0  as  “all  three  units  down,” 
state  1  as  “the  operating  unit  up,"  state  2 
as  “the  operating  unit  and  a  standby  up,”  and 
state  3  as  “the  operating  unit  and  two  stand¬ 
by  units  up,”  the  state  transition  matrix  is  as 
depicted  in  figure  D-10. 


The  differential  equations  formed  from  the 
state  transition  matrix  are: 


P3(t)--<x+2xD)P3(t)  P3(0).,  (D.36) 

p^ft)  *  -<x+x D)P2(t)  ♦  (x+tXpjPjto  p,(0)  -  o  (D-3  7) 

P,'(t) « -xPj(t) ♦  (x*x„)P2(t)  p,(0)-0  (D-3 8) 

p j(t)  *  xp,(i)  Po(0)«o  (D-39) 

Solutions  of  differential  equations  (D-36  - 
D-39)  are: 


P3(t)  * 

X+2Xr,  l  \ 

|e  *(X+X|>)t  _  -<X+3X„)t^ 

(X+XD)  <x«  lxD)  je-tx^sxpH^-xt  e-<X«  XpX^-xi  ) 


*+tXp  / 
p,(t)» - 


P.ct)  ■ 


))<X*2XD)  L-<x-hxd! 
XD  |  2xt 


P0(D-  1  -P30)-Pj(t)  -P,(i). 


D-10 
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The  reliability  of  the  system,  based  on  the 
occurrence  of  either  one  of  the  states  !,  2 
or  3  is: 


WO-Pjffl  +  PjfO  +  P.O). 


If  M  is  small  compared  to  6,  equations  (D-41) 
and  ( D-42 )  can  be  simplified  to: 

MTBF  *  (rNj)  >  MN  R+I/N  (D-45) 


D.3.5  MTBF  of  R  of  N  Identical  Repairable 
Elements  in  Parallel  with  Restricted 
Repair  -  The  Einhom  Equations 


The  birth  and  death  equations  are  a  natural 
tool  to  obtain  the  MTBF  of  R  out  of  N  iden¬ 
tical  repairable  elements  in  parallel.  A  special 
case  (4  of  6  restricted  repair)  has  already 
been  calculated.  We  develop  here  the  re¬ 
stricted  repair  solution  for  the  general  R  of  N 
case. 

A  Markov  graph  for  this  situation  is  shown 
in  figure  D-l  I. 

Under  steady  state  conditions  the  reliabil¬ 
ity  R„  of  state  R  (R  elements  up)  is  con¬ 
stant.  The  MTBF  of  state  R  is 

1 

RXRr  ' 

Since  under  steady  state  conditions  all  reli¬ 
ability  rates  are  zero, 

at  state  N,  Rn(NX)  =  RN_,  Qi)  (D-40) 

at  state  N-1,Rn_,  ((N-1)A)=  Rn_,  (p)  (D-41) 

•  *  • 

.  *  * 

:  R+l,RRt,((R+l)X)*RR  00.  (D-42) 


Solving  these  equations  recursively  and  letting 
0  =  1/X,  M  =  1/m  be  the  mean  time  to  failure 
and  mean  time  to  repair,  respectively,  of  a 
single  element,  gives: 


(D-43) 

(D-44) 


MTTR  «  M/(N-R+1)  (I>46) 

Equations  (0-45)  and  (D-46)  are  the  Einhom 

[6]  approximation  which  were  derived  orig¬ 
inally  by  their  author  from  considerations  of 

quorum  probabilities. 
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Appendix  E 

STATISTICAL  TABLES 


This  Appendix  contains  tables  useful  in 
reliability  and  availability  evaluation  and  ref¬ 
erences  to  more  extensive  tables.  It  is  organ¬ 
ized: 

E.l  TABLES  USEFUL  FOR  RELIABIL¬ 
ITY  CALCULATIONS 

E.  1 . 1  Chi-Square  Tables 

(80%  Confidence  Level) 

E.  1 .2  Reliability  Lower  Bound  Tables 
(80%  Confidence  Level) 


confidence  bound  for  exponential  time  to 
failure  data. 

Since  figure  E-l  provides  chi-square  values 
at  the  80%  confidence  level,  80%  upper 
bounds  on  failure  rate  or  80%  lower  bounds 
on  MTBF  are  obtained: 

>  _  X2  ,aof 

.so  2T 


E.l. 3  Binomial  Tables 

(80%  Confidence  Level) 

E.2  TOLERANCE  FACTORS  FOR  THE 
NORMAL  DISTRIBUTION 

E.3  TABLES  USEFUL  FOR  AVAIL¬ 
ABILITY  CALCULATIONS 

E.3.1  The  F  Distribution 

(80%  Confidence  Level) 

E.3. 2  The  a  Distribution 

(80%  Confidence  Level) 

E.4  MTBF  TABLES  FOR  m  OUT  of  n 
WITH  AND  WITHOUT  REPAIR 

E.5  REFERENCES 


Where  f,  the  degrees  of  freedom  is  2x+2 
when  the  test  is  truncated  by  time  (Type  I 
life  censoring  tests  §  5. 1 . 1  .b  and  f  =  2x  when 
the  test  is  terminated  at  a  predetermined 
number  of  failures  (Type  II  life  censoring  test 
§  5.1.1  .c). 

Example 

In  a  Type  I  test  5  failures  are  observed  in 
500  hours  of  test,  find  the  upper  bound  on 
failure  rate  (80%  confidence  level) 

calculate  f  =  2x  +  2  =  2(5)  +  2  or  K! 

find  x2  .go  value  in  figure  E-l  =  15.812 

Solve  equation  E-l 


E.l  TABLES  USEFUL  FOR  RELIABILITY 
CALCULATIONS 


X 


X2. 80:12  _  15.812 
2T  ‘  2(500) 


This  paragraph  provides  tables  useful  in 
calculating  80%  lower  bounds  on  reliability 
when  the  underlying  process  is  Poisson,  Ex¬ 
ponential,  or  Binomial. 

E.l  .1  Chi-Square  Tables 

Figure  E-l  presents  chi-square  tables  at  the 
80%  confidence  level.  These  tables  may  be 
used  to  obtain  the  failure  rate  or  MTBF  80% 


=  failures/hour 

a  _  2T  _  2(500) 

*°  x2  ,  >5. 182 

*  .80:12 

=  63J4  hours 

The  reliability  lower  bound  for  a  one  hour 
mission  would  be  calculated  as 
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f 

CH1-SQU 

f 

CHI-SQU 

f 

CHI-SQU 

f 

CHI-SQU 

2 

3.219 

102 

1 13.786 

202 

218.693 

302 

322.466 

4 

5.989 

104 

115.903 

204 

220.777 

304 

324.534 

6 

8.558 

106 

118.020 

206 

222.860 

306 

326.602 

8 

11.030 

108 

120.135 

208 

224.943 

308 

328.670 

10 

13.442 

110 

122.250 

210 

227.025 

310 

330.738 

12 

15.812 

112 

124.363 

212 

229.107 

312 

332.806 

14 

18.151 

114 

126.475 

214 

231.189 

314 

334.873 

16 

20.465 

116 

128.586 

216 

233.270 

316 

336.940 

18 

22.760 

118 

130.697 

218 

235.351 

318 

339.007 

20 

25.038 

120 

132.806 

220 

237.432 

320 

341.074 

22 

27.301 

122 

134.915 

239.512 

322 

343.140 

24 

29  553 

124 

137.022 

224 

241.592 

324 

345.207 

26 

31.795 

126 

139.129 

226 

243.671 

326 

347.273 

28 

34.027 

128 

141.235 

228 

245.750 

328 

349.339 

30 

36.250 

130 

143.340 

230 

247.829 

330 

351.404 

32 

38.466 

132 

145.444 

232 

249.907 

332 

353.470 

34 

40  (.76 

134 

147.548 

234 

251.986 

334 

355.535 

36 

42.879 

136 

149.651 

236 

254.063 

336 

357.600 

38 

45.076 

138 

151.753 

238 

256.141 

338 

359.665 

40 

47.269 

140 

153.854 

240 

258.218 

340 

361.730 

42 

49  456 

142 

155.954 

242 

260.295 

342 

363  794 

44 

51.639 

144 

158.054 

244 

262.371 

344 

365.859 

46 

53.818 

146 

160.153 

246 

264.447 

346 

367.923 

48 

55.993 

148 

162.251 

248 

266  523 

348 

369.987 

50 

58  164 

150 

164.349 

250 

268.599 

350 

372.051 

52 

60.332 

152 

166.446 

252 

270.674 

352 

374.1  14 

54 

62  496 

154 

168.543 

254 

272.749 

354 

376.178 

56 

64.658 

156 

1  70.639 

256 

274.823 

356 

378.241 

58 

66  816 

158 

172.734 

258 

276.898 

358 

380.304 

60 

68.97  2 

160 

174.828 

260 

278.972 

360 

382.367 

62 

71  125 

162 

176.922 

262 

281.046 

362 

384.429 

64 

73  276 

164 

179.016 

264 

283.1  19 

364 

386.492 

66 

75  424 

166 

181.109 

266 

285.192 

366 

388.554 

68 

7*. 571 

168 

183.201 

268 

287.265 

368 

390  617 

70 

-o  "15 

170 

185.293 

270 

289  338 

370 

392.679 

81  *57 

172 

187.384 

272 

291  410 

372 

394  740 

74 

83.997 

174 

189.474 

274 

293.482 

374 

396.802 

76 

86.135 

176 

191.565 

276 

295.554 

376 

398.864 

78 

88  271 

178 

193.654 

278 

297.626 

378 

400.925 

80 

90  405 

180 

195.743 

280 

299.697 

380 

402.986 

82 

92.538 

182 

197.832 

282 

301.768 

382 

405.047 

84 

94  669 

184 

199.920 

284 

303.839 

384 

407.108 

86 

96  799 

186 

202.008 

286 

305  910 

386 

409.169 

88 

98  927 

188 

204.095 

288 

307.980 

388 

411.229 

90 

101  054 

190 

206.182 

290 

310.050 

390 

413.290 

92 

103  179 

192 

208.268 

292 

312.120 

392 

415.350 

94 

105  303 

194 

210.354 

294 

314.190 

394 

417.410 

96 

10'  425 

196 

21  2.439 

296 

316  259 

396 

419.470 

98 

109.547 

198 

214.524 

298 

3)8.328 

398 

421.530 

100 

1 1  1  667 

200 

216.609 

300 

320.397 

400 

423.589 

Figure  E-l .  Chi-Square  Distribution  at  80%  Confidence  Level,  Degress  of  Freedom  from  2  to  400 
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a  .80  * 


where: 


3  e  (  015812 )(  I ) 

=  .9843 

More  extensive  chi-square  tables  are  avail¬ 
able  from  many  sources,  e.g.,  [  1  ] . 

E.1.2  Reliability  Lower  Bound  Tables 

The  reliability  tables  shown  in  figure  E-2 
solve  the  problem  illustrated  in  §  £.1.1  di¬ 
rectly. 

Figure  E-2  is  entered  at  S  failures  and  a 
test  time  of  500  hours.  The  80%  lower  bound 
on  reliability  is  read  directly  as: 

R  go  *  .9843 

which  agrees  with  the  result  obtained  in 
§  E.1.1. 

A  complete  discussion  of  the  uses  and 
derivation  of  the  reliability  tables  and  a  far 
more  extensive  set  of  tables  are  available, 
e.g.,  [2]. 

E.1.3  Binomial  Tables 


N  3  Number  of  trials 
X  3  Number  of  failures 
S  3  N-X 

H‘lrf=r 

2S-1  2X+1 

.8416>/H-T3S20 

w  =  - 2 - 

H 

'  Gx^T'lil)  (4514"ifi) 

|e|  <  .0008 

For  example,  if  equation  E-3  is  used  for 
N  =  1 20  and  X  =  5  then 

R  =  115 _ 

80  1 15  +(5+l)e2*>J4S69> 

R. go  =  .9347  (figure  E-3  gives  .9349) 


More  extensive  binomial  tables  are  available, 

e.g-,  11]. 


Figure  E-3  presents  binomial  tables  at  the 
80%  confidence  level.  The  80%  lower  bound 
on  reliability  is  read  directly  from  these  tables 
at  the  intersection  of  the  sample  size  (N) 
and  the  number  of  failures  (x).  For  example, 
if  2  failures  are  obtained  in  40  trials  the 
estimate  is£=  38/40  =  0.9500  and  the  80% 
lower  bound  R  M  =  0.8960  (from  figure  E-3). 

Some  extensions  of  the  tables  are  dis¬ 
cussed: 

(a)  For  N  >  40  and  N  <  1 20  linear  inter¬ 
polation  of  the  table  in  figure  E-3  for 
odd  N  will  yield  a  maximum  error  of 
0.0001 

(b)  For  X  3  0 

Roao3(0.20),/N  (E-2) 

(c)  For  N  >  1 20  and  X  *  0 


E.2  TOLERANCE  FACTORS  FOR  THE 
NORMAL  DISTRIBUTION 

Figures  E-4  through  E-7  present  one-sided 
and  two-sidfr  4  ‘oleranc*  Tutors  for  the  Nor¬ 
mal  Distributee?  at  50%  and  80%  confidence 
levels. 

As  an  example  of  the  use  of  these  tables, 
the  method  of  §  5.4.3  will  be  used. 

Assume,  as  a  parallel  to  the  example  con¬ 
nected  with  expression  (5-41),  that  n=5,x=8, 
s30.484,  LCLS=6,  UCLS=10,  then  K=4.09 
and  the  two-sided  table  of  figure  E-7  yields  a 
reliability  of  .975  for  K=3.8403,  and  of  .99 
for  K=4.4I33.  Linear  interpolation  yields 
R30.982  for  K=4.09.  More  extensive  tables 
are  available,  e.g.,  [3] . 

E.3  TABLES  USEFUL  FOR 

AVAILABILITY  CALCULATIONS 


E.3. 1  The  F  Distribution 
S 

R.«°  *  c  +  fY+n  +C  (E‘3)  Figure  E-8  presents  the  F  distribution  at 

the  80%  confidence  level. 


80%  CONFIDENCE  80%  CONFIDENCE 


NORMALIZED  TEST  TIME.  **lf  Testing  is  truncated  by  failure,  enter  1  failure  less  than  those  observed  to  obtain  Reliability  Lower  Bound. 
Figure  E-2.  Reliability  Lower  Bound  for  Exponential  Components  with  Failures  Truncated  by  Test  Time**  (Continued) 
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Reliability  Lower  Bound  for  Exponential  Components  with  Failures  Truncated  by  Test  Time**  (Continued) 
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Figure  E-2.  Reliability  Lower  Bound  for  Exponential  Components  with  Failures  Truncated  by  Test  Time**  (Continued) 
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Figure  E-3.  Binomial  Tables  (80%  confidence)  (Continued) 
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Figure  E-4.  One-Sided  Tolerance  Factors  for  the  Normal  Distribution  at  50%  Confidence 
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0.8066 

1.4461 
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RELIABILITY 


N 

.75 

.90 

.95 

.975 

.99 

.999 

.9999 

.99999 

62 

0.7974 

1 .4344 

1.8199 

2.1560 

2.5480 

3.3682 

4.0453 

4.6337 

64 

0.7954 

1.4319 

1.8170 

2.1526 

2.5442 

3363S 

4.0396 

4.6274 

66 

0.7935 

1 .4294 

1.8142 

2.1495 

2.5406 

33589 

4.0343 

4.6213 

68 

0.7917 

1.4271 

1.8115 

2.1464 

2.5372 

3.3546 

4.0292 

4.6156 

70 

0.7899 

1 .4249 

1 .8089 

2.1435 

2.5339 

3.3505 

4.0244 

4.6101 

80 

0.7822 

1.4151 

1 .7976 

2.1309 

2.5195 

33324 

4.0031 

4.S860 

90 

0.7758 

1.4070 

1 .7884 

2.1205 

2.5077 

3.3175 

3.9856 

4.5662 

100 

0.7704 

1.4003 

1.7806 

2.1117 

2.4978 

3.3051 

3.9710 

4.5497 

125 

0.7600 

1 .3872 

1.7656 

2.0949 

2.4787 

3.2811 

3.9428 

4.5177 

150 

0.7524 

1 .3776 

1 .7546 

2.0826 

2.4648 

3.2635 

3.9222 

4.4945 

175 

0.7465 

1 .3703 

1.7461 

2.0731 

2.4540 

3.2501 

3.9064 

4.4766 

200 

0.7417 

1.3643 

1 .7393 

2.0655 

2.4454 

3.2393 

3.8937 

4.4623 

225 

0.7378 

1.3595 

1 .7337 

2.0592 

2.4384 

3.2304 

3.8833 

4.4505 

250 

0.7345 

1.3553 

1 .7290 

2.0540 

2.4324 

3.2229 

3.8745 

4.4406 

275 

0.7317 

1.3518 

1 .7250 

2.0494 

2.4273 

3.2165 

3.8670 

4.4321 

300 

0.7292 

13488 

1.7215 

2.0455 

2.4229 

3.2110 

3.8605 

4.4247" 

325 

0.7270 

1.3460 

1.7184 

2.0420 

2.4190 
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3.8547 

4.4182 
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0.725 1 

1 .3436 

1.7156 

2.0390 

2.4155 

3.2017 

3.8496 

4.4124 
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0.7233 

1.3415 

1.7132 

2.0362 

2.4124 

3.1978 

3.8451 

4.4072 

400 

0.7217 

1 .3395 

1.7110 

2.0337 

2.4096 

3.1943 

3.8409 

4.4026 

425 

0.7203 

1.3378 

1 .7089 

2.0315 

2.4070 

3.1911 

3.8372 

4.3983 

450 

0.7190 

1 .3362 

1.7071 

2.0294 

2.4047 

3.1882 

3.8338 

4.3945 

475 

0.7178 

1 .3347 

1 .7054 

2.0275 

2.4025 

3.1855 

3.8306 

4.3909 

500 

0.7167 

1 .3333 

1 .7038 

2.0258 

2.4006 

3.1830 

3.8277 

4.3876 

525 

0.7157 

1.3320 

1 .7024 

2.0241 

2.3987 

3.1807 

3.8250 

4.3846 

550 

0.7147 

1.3309 

1.7010 

2.0226 

2.3970 

3.1786 

3.8225 

4  3818 
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0.7138 

1 .3297 

1.6998 

2.0212 

2.3954 

3.1766 

3.8202 
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1 .3287 

1 .6986 

2.0199 

2.3939 

3.1747 
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13277 

1 .6975 

2.0187 
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3.1730 
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0.7078 

1.3223 

1.6913 

2.0117 

2.3847 

3.1632 

3.8045 
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850 

0.7068 

13211 

1 .6898 

2.0102 

2.3829 

3.1610 

3.8019 

4.3584 

900 

0.7058 

1.3199 

1 .6886 

2.0087 

2.3813 

3.1589 
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4.3557 

950 

0.7050 
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1 .6874 

2.0074 

2.3798 

3.1571 

3.7973 

4.3532 

1000 

0.7042 

1.3179 

1 .6863 

2.0062 

2.3784 

3.1553 

3.7953 

43510 

1500 

0.6987 

1.3112 

1 .6786 

1.9976 

2.3687 

3.1432 

3.7811 

4.3349 

2000 

0.6955 

1 .3072 

1 .6740 

1 .9925 

2.3630 

3.1360 

3.7726 

43254 

3000 

0.6916 

13024 

1 .6686 

1 .9865 

23562 

3.1275 

3.7627 

43142 

4000 

0.6893 

1.2996 

1 .6654 

1 .9829 

23522 

3.1225 

3.7568 

4.3075 

5000 

0.6877 

1.2977 

1 .6632 

1  9804 

2.3494 

3.1191 

3.7528 

4.3030 

11000 

0.6838 

1 .2930 

1.6578 

1 .9744 

2.3426 

3.1106 

3.7428 

4.2917 

Figure  E-5  One-Sided  Tolerance  Factors  for  the  Normal  Distribution  at  80%  Confidence 
with  Sample  Size  n  from  2  to  10,000  (Continued) 
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RELIABILITY 
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.75 

.90 

.95 

975 

.99 

999 

.9999 

.99999 

-> 

2.1319 

3.0483 

3.6323 

4.1539 

4.7737 

6.0982 

7.2103 

8.1858 

3 

1.6120 

2J049 

2.7465 

3.1409 

3.6095 

4.6110 

5.4519 

6.1896 

4 

1 .4573 

2.0837 

2.4829 

2.8394 

3.2631 

4.1684 

4.9286 

5.5954 

5 

1 3813 

1 .975 1 

2.3535 

2.6916 

3.0930 

3.9512 

4.6718 

5.3039 

6 

1 .3359 

1.9101 

2.2760 

2.6029 

2.9912 

3.8212 

4.5180 

5.1293 

7 

1 .3055 

1 .8667 

2.2243 

2.5437 

2.9232 

3.7343 

4.4152 

5.0126 

8 

1.2837 

1 .8355 

2.1872 

2.5012 

2.8744 

3.6720 

43416 

4.9290 

9 

1 .2673 

1.8121 

2.1593 

2.4693 

2.8377 

3.6251 

4.2862 

4.8661 

10 

1 .2545 

1 .7938 

2.1375 

2.4444 

2.8091 

3.5886 

43430 

4.8170 

11 

1 .2443 

1 .7792 

2.1200 

2.4244 

2.7862 

3.5592 

43083 

4.7777 

12 

1.2359 

1.7671 

2.1057 

2.4080 

2.7673 

3.5352 

4.1798 

4.7454 

13 

1 .2288 

1.7571 

2.0937 

2.3944 

2.7516 

3.5151 

4.1561 

4.7184 

14 

12229 

1 .7486 

2.0836 

23827 

2.7383 

3.4980 

4.1359 

4.6955 

15 

1.2178 

1.7413 

2.0748 

23728 

2.7268 

3.4834 

4.1186 

4.6759 

16 

1.2133 

1 .7349 

2.0673 

3.3641 

2.7169 

3.4707 

4.1036 

4.6588 

17 

1.2094 

1 .7294 

2.0607 

2.3566 

2.7082 

3.4596 

4.0905 

4.6439 

18 

1 .2060 

1.7244 

2.0548 

2.3499 

2.7005 

3.4497 

4.0789 

^4.6307 

19 

1 .2030 

1.7201 

2.0496 

2.3439 

2.6936 

3.4410 

4.0685 

4.6190 

20 

1.2002 

1.7162 

2.0449 

2.3386 

2.6875 

3.4332 

4.0592 

4.6084 

21 

1  1977 

1.7126 

2.0407 

2.3338 

2.6820 

3.4261 

4.0509 

4.5990 

22 

1.1955 

1.7094 

2.0369 

2.3294 

2.6770 

3.4197 

4.0433 

4.5904 

23 

1.1935 

1 .7065 

2.0334 

2.3254 

2.6724 

3.4139 

4.0365 

4.5826 

24 

1.1916 

1 .7039 

2.0303 

2.3216 

2.6682 

3.4086 

4.0302 

4.5754 

25 

1.1899 

1.7014 

2.0274 

2.3185 

2.6644 

3.4037 

4.0244 

4.5689 

26 

1.1883 

1 .6992 

2.0247 

2.3154 

2.6609 

3  3992 

4.0191 

45629 

27 

1.1869 

1.6971 

2.0222 

2.3126 

2.6577 

33951 

40142 

4.5573 
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1.1856 

1.6952 

2.0199 

2.3100 

2.6547 

33912 

4.0097 

4.5521 

29 

1.1843 

1 .6934 

2.0178 

2.3076 

2.6519 

33877 

4.0054 

4.5474 

30 

1.1831 

1.6917 

2.0158 

2.3053 

2.6493 

33843 

4.0015 

4.5429 

31 

1.1821 

1 .6902 

2.0140 
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3.9978 
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1.1810 

1.6887 

2.0123 
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1 .6838 

2.0063 
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1.1761 
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2.0038 
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2.6335 

3.3642 

3.9777 

4.5158 

40 

1.1748 

1 .6798 

2.0016 

2.2890 

2.6305 
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3.9732 
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1.6781 

1.9996 
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2.6279 

3.3570 

3.9692 
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2.2846 
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2.6177 

3.3440 
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2.2765 

2.6161 
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39515 

4.4861 

56 

1.1677 
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Figure  E-6.  Two-Sided  Tolerance  Factors  for  the  Normal  DistriNrtion  at  50 9r  Confidence 
with  Sample  Size  n  from  2  to  10,000 
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RELIABILITY 


N 

.75 

.90 

.95 

.975 

.99 

.999 

.9999 

62 

1.1660 

1.6672 

1.9866 

2.2719 

2.6109 

33353 

3.9435 

64 

1.1655 

1.6665 

1 .9858 

2.2709 

2.6098 

33339 

35419 

66 

1.1650 

1.6659 

1.9850 

2.2700 

2.6087 

33326 

3.9403 

68 

1.1646 

1.6652 

1.9843 

2.2692 

2.6078 

33313 

35388 

70 

1.1642 

1.6646 

1.9836 

2.2684 

2.6068 

3  3301 

35374 

80 

1.1624 

1.6622 

1.9806 

2.2650 

2.6029 

33251 

35315 

90 

1.1611 

1.6602 

1 .9783 

2.2623 

2.5999 

33213 

3.9269 

100 

1.1600 

1 .6587 

1.9764 

2.2602 

2.5975 

33182 

3.9233 

125 

1.1581 

1.6559 

1.9731 

2.2564 

2.5931 

33126 

3.9167 

150 

1.1568 

1.6540 

1.9709 

2.2539 

2.5902 

33089 

35123 

175 

1.1559 

1 .6527 

1 .9693 

2.2521 

2.5881 

3.3063 

35092 

200 

1.1552 

1.6517 

1.9682 

2.2508 

2.5866 

3  3043 

35069 

225 

1.1546 

1.6510 

1 .9672 

2.2497 

2.5854 

3  3028 

3.9051 

250 

1.1542 

1.6504 

1 .9665 

2.2489 

2.5844 

33015 

3.9036 

275 

1.1538 

1.6499 

1 .9659 

2.2482 

2.5837 

3  3005 

3.9024 

300 

1.1536 

1.6494 

1.9654 

2.2476 

2.5830 

3.2997 

3.9014 

325 

1.1533 

1.6491 

1 .9650 

2.2472 

2.5824 

3.2990 

3.9006 

350 

1.1531 

1.6488 

1.9646 

2.2468 

2.5820 

33984 

3.8999 

375 

1.1529 

1.6485 

1.9643 

2.2464 

2.5816 

3.2979 

3.8993 

400 

1.1528 

1.6483 

1.9641 

2.2461 

2.5812 

3.2974 

35997 

425 

1.1526 

1.6481 

1.9638 

2.2458 

2.5809 

3.2970 

3.8982 

450 

1.1525 

1.6479 

1 .9636 

2.2456 

2.5806 

3.2966 

35978 

475 

1.1524 

1.6477 

1 .9634 

2.2453 

2.5804 

3.2963 

35974 

500 

1.1523 

1.6476 

1.9632 

2.2451 

2.S801 

3.2960 

35971 

525 

1.1522 

1.6475 

1.9631 

2.2450 

2.5799 

33958 

35968 

550 

1.1521 

1.6473 

1 .9629 

2.2448 

2.5797 

33955 

35965 

575 

1.1520 

1.6472 

1 .9628 

2.2447 

2.5796 

3.2953 

3.8962 

600 

1.1519 

1.6471 

1.9627 

2.2445 

2.5794 

33951 

3.8960 

625 

1.1519 

1.6470 

1 .9626 

2.2444 

2.5793 

3.2949 

35958 

650 

1.1518 

1.6470 

1 .9625 

2.2443 

2.5791 

3.2948 

35956 

675 

1.1518 

1.6469 

1 .9624 

2.2442 

2.5790 

3.2946 

35954 

700 

1.1517 

1.6468 

1 .9623 

2.2441 

2.5789 

3.2944 

3.8952 

725 

1.1517 

1.6467 

1 .9622 

2.2440 

2.5788 

33943 

35951 

750 

1.1516 

1.6467 

1.9621 

2.2439 

2.5787 

3.2942 

3.8949 

800 

1.1515 

1.6466 

1 .9620 

2.2437 

2.5785 

3.2940 

3.8947 

850 

1.1515 

1.6465 

1.9619 

2.2436 

2.5784 

3.2938 

3  5944 

900 

1.1514 

1.6464 

1.9618 

2.2435 

2.5782 

32936 

3  5942 

950 

1.1514 

1.6463 

1.9617 

2.2434 

2.5781 

3.2934 

3.8940 

1000 

1.1513 

1.6462 

1.9616 

2.2433 

2.5780 

3.2933 

3.8938 

1500 

1.1510 

1.6458 

1.9611 

2.2426 

2.5773 

32924 

3.8928 

2000 

1.1508 

1.6455 

1.9608 

2.2423 

2.5769 

3.2919 

3.8922 

3000 

1.1507 

1.6453 

1.9605 

2.2420 

2.5765 

32914 

35917 

4000 

1.1506 

1.6452 

1.9604 

2.2419 

2.5764 

32912 

35914 

5000 

1.1505 

1.6451 

1.9603 

2.2418 

2.5763 

32911 

3.8912 

10000 

1.1504 

1.6450 

1.9601 

2.2416 

2.5760 

3.2908 

35909 

Figure  E-6  Two-Sided  Tolerance  Factors  for  the  Normal  Distribution  at  50%  Confidence 
with  Sample  Size  n  from  2  to  10,000  (Continued) 
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RELIABILITY 


N 

.75 

.90 

95 

.975 

99 

.999 

9999 

.99999 

2 

56758 

8.1156 

96703 

110590 

12.7090 

16.2353 

19.1960 

21.9731 

3 

2.8411 

4.0624 

46406 

55357 

6.3617 

8.1268 

96088 

10.9089 

4 

2.2357 

3.1968 

36093 

43562 

5.0062 

65953 

75615 

85846 

5 

1.9709 

26182 

33581 

3.8403 

4.4133 

5.6378 

6.6659 

7.5678 

6 

16207 

2  6033 

3.1021 

35475 

4.0768 

52080 

6.1577 

6.9908 

7 

1 .7230 

2.4637 
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Figure  E-7.  Two-Sided  Tolerance  Factors  for  the  Normal  Distribution  at  809f  Confidence 
with  Sample  Size  n  from  2  to  10,000 
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Figure  E-7.  Two-Sided  Tolerance  Factors  for  the  Norma)  Distribution  at  80%  Confidence 
with  Sample  Size  n  from  2  to  10.000  (Continued) 
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Figure  E-8.  F  Distribution  (80%  Confidence)  (Continued) 
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The  F  Distribution,  figure  E-8  is  the  dis¬ 
tribution  of  the  quotient  of  two  chi-square 
distributions. 

Its  use  has  already  been  illustrated  in 
§  7.2.6  in  the  determination  of  availability 
bounds  when  both  times  to  failure  and  times 
to  repair  are  exponentially  distributed. 

Another  use,  illustrated  in  §  5.4.6.8,  is  in 
the  acceptance  or  rejection  of  the  Bayesian  X 
prior  when  both  prior  and  posterior  data  arise 
both  from  frequency  data  to  be  tested  for 
the  hypothesis  that  they  are  from  the  same 
exponential  distribution. 

Yet  a  third  use  of  the  F  distribution  not 
illustrated  in  the  manual  is  in  testing  the 
hypothesis  that  the  1st  time  to  failure  of  an 
item  is  consistent  with  subsequent  time  to 
failure.  This  is  important  to  know  for  re¬ 
pairable  items  since  a  new  item  could  be 
very  unreliable,  for  instance,  until  repaired 
at  least  once. 

The  test  statistic  is 

U  =  i  tj/i(x-i)t,  ] 

1=2 

If  Ftt  j  Ix  2  <  u  then  there  is  evidence  that  t, 
represents  an  abnormally  short  time  to  1st 
failure. 

In  the  expression  a  represents 

the  critical  region  (always  209?  for  figure  E-8), 
2  represents  f, ,  the  number  of  degrees  of 
freedom  associated  with  t,  (one  failure), 
and  2x-2  represents  f2 ,  the  number  of  degrees 
of  freedom  associated  with  t2  through  tx 
orx-1  failures. 

If.  for  instance,  t,  =  20  hours,  and 
12 

2  tj  =  980  hours, 

2 

then. 

980  _  .  ... 

u~  (1  I  )(20)  ~  4  455 

But  F0 ,20,2,22  =  1.7331,  and  one  must  con¬ 
clude  that  the  time  to  first  failure  is  incon¬ 
sistent  with  subsequent  times  to  failure  at  the 
209?  level  of  significance. 

The  F  distribution  could  similarly  be  used 
to  test  the  hypothesis  of  abnormally  long 
times  to  first  failure. 

More  extensive  F  tables  are  found  in  ( I  ] . 


E.3.2  The  a  Distribution 

The  a  distribution,  figure  E-9,  is  useful  in 
calculating  availability  bounds  when  times  to 
failure  are  exponentially  distributed,  and 
times  to  repair  are  lognormally  distributed. 

Example  of  Lower  Availability  Bound 
Computation 

It  is  shown  in  [4]  and  [5]  that  a  lower 
bound  on  availability  for  a  component  ex¬ 
hibiting  exponential  times  to  failure  and  log¬ 
normal  times  to  repair  is  given  by: 

Al  =  l/[l+(X//i)uJ  (EA) 

where 

-v  e°2/J 

(X//l)u  -  (X/p)  3 1  <, ,  m  /o3  .  „  (  [-.5  ) 

The  variance  o 2  of  Naperian  logarithms  01 
repair  times  is  assumed  to  be  known.  When 
this  variance  is  unknown,  it  can  be  estimated 
directly  for  repair  times  MCj,  i=l,  2...m; 
m  >  2  from  formula  (E-6): 


The  penalty  for  using  the  estimated  var¬ 
iance  o3  rather  than  the  true  variance  o2  is 
that  the  lower  bound  on  availability  tends  to 
be  optimistic  (too  high)  if  repair  data  are  few. 

In  [E-5]  ^  represents  a  failure  rate  estimate 
x/T  from  exponential  components  tested  to 
failure  with  x  the  total  number  of  failures 
observed  and  T  the  cumulative  test  time  of 
all  units  on  test,  j?  represents  a  repair  rate 
estimated  from 


where  the  MCj.s  arc  the  m  observed  repair 
times,  a3,  the  variance  of  the  Naperian  log¬ 
arithms  of  repair  times,  is  assumed  to  be 
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Fifure  E-9.  The  i  Distribution  (80%  Confidence) 
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known  or  is  estimated  from  expression 
IE-61  and  a,^.m/02;x  is  the  “a”  distribution 
at  confidence  7=1  -a  considered  in  this  appen¬ 
dix. 

As  an  example  of  the  application  of  [E-4] , 
assume  that  the  following  parameters  are 
known  or  estimated : 


X  =  0.005,  x=  10,ft=  0.393,o=  0.413, 


m  =  10 


*l«»jm/o2;  *  "  a0.80;  60;  10 


which  is  not  tabulated  in  figure  E-9  of  this 
report. 

One  can  also  interpolate  (harmonically  for 
good  results)  between  the  tabular  entries 

a0.80  £0;10  an£i  a0.80;100;10  *°  Obtain  a<).80j60;IO 
=  25.55. 

A  harmonic  interpolate  is  obtained  by 
using  the  linear  interpolation  technique, 
after  the  marginal  values  are  changed  to 
their  reciprocals.  For  this  example,  the  end¬ 
points  (50  and  100)  become  .02  and  .01. 
The  desired  midpoint  (60)  becomes  .01666. 

Both  the  original  margins  and  their  recip¬ 
rocal.,  correspond  to  the  same  tabular  entries. 

Note  that  interpolation  between  values  of 
N  =  x  (rows  of  the  table)  is  possible  too. 
Linear  interpolation  works  well  without  any 
transformation. 

Using  the  value  of  a0  80;6<);10  in  [E-5],one 
obtains: 


„  _  (0.005)  (1.089) 

(X/p)u  093  26  (?--55)  =  0.017 


Al  -  7-777T  =  0.982 
L  1  +(X/p)u 

Notice  that  this  80%  lower  confidence 
bound  is  lower  than  the  point  estimate  for 
component  availability  which  is: 

ft  =  ft/(ft+ft)  *  0.393/(0.393  +  0.005) «  0.987 

More  extensive  tables  are  found  in  (4] . 


E.4  MTBF  TABLES  (m  of  n) 

Figure  E-l  1  provides  tables  of  the  MTBF 
for  various  m-out-of-n  configurations  of 
identical  items.  The  MTBF  is  provided  under 
two  options.  The  first  is  to  permit  the  system 
to  degrade  until  it  fails  (i.e.,  until  less  than  the 
required  m-out-of-n  are  operating).  The  sec¬ 
ond  is  to  repair  the  first  failure  immediately. 
The  basic  configuration  is  shown  in  figure  E- 1 0. 

The  MTBF  for  the  first  case  is  obtained  >:y 
integration  of  the  reliability  function  from 
zero  to  infinity. 

MTBF  =  f“  R(t)dt 

For  example,  in  the  case  of  5-out-of-8  re¬ 
quired  we  would  have 


MTBF  *  JJ"  |56R(t)s  -  140R(t)« 

+  1 20R(t)7  -  35R(t)8  1  dt 

Assuming  a  single  item  has  a  failure  rate  X 
and  obeys  the  exponential  failure  law,  we 
have: 

MTBF  = /  "  [56e  sx'  -  |40e  6X1  +  120e7Xl  -35e  8X']dt 


56  140  120  35 

5X  '  6X  7X  ‘ 8X 


_  +9.408  -  19.600  +  14,400  -  3.675 
840X 


MTBF  =  0.6345/X 

This  result  can  also  be  obtained  using: 
MTBF  =  £ 

S=m  iA 


for  5  of  8 


MTBF  *  I  FT 
s=s 


MTBF  =  0.6345/X 
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Figure  E-l  I .  MTBF  for  M  out  of  N  Redundant  Configuration  Without  and  With*  Repair 
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The  approximation: 


The  tabular  value  uses 


MTBF  * 


(E-7) 


is  used  when  the  first  failed  item  is  repaired 
immediately,  u,  the  repair  rate  or  reciprocal 
of  MTTR,  is  normally  much  greater  than  the 
failure  rate.  When  this  maintenance  strategy 
is  used  the  major  gain  comes  from  the  first 
redundant  unit  {i.e.,  (n-I>-out-of-nj .  In  cases 
where  the  repair  rate,  u,  is  much  greater  than 
the  failure  rate,  X,  it  is  rare  to  require  more 
than  two  excess  units  as  is  shown  by  extend¬ 
ing  our  example  where  5-out-of-8  are  required 
(i.e.,  n-m  =  8-5  =  3  excess  units). 


i  ai 

280  x« 


MTBF  =  _SL 
2X1 

in  accordance  with  the  approximate  formula 
(Equation  E-7). 

MTBF  tables  are  presented  in  figure  E-ll. 

Example 

Use  the  tables  in  figure  E-l  l  to  calculate 
MTBF  when  5  of  8  units  are  required  for  suc¬ 
cess  and  X  =  1x10  3  failures/hour  and  MTTR 
=  1  hour  (p  =  1  repair/hour) 

From  figure  E-10 

Without  repair 
MTBF  =  0.6345  24/X 
MTBF  =  634.524  hours 

With  repair  (unrestricted) 

MTBF  =  0.003571  p3/X4 
MTBF  =  3,571  x  10*6  hours 

E.5  REFERENCES 

1.  Handbook  of  Mathematical  Functions, 
U.S.  Department  of  Commerce  National 
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2.  NAVORD  OD  30668  Reliability  Tables. 


which  for  a  case  where  X  =  1  x  10  6  and 
MTTR  =  1  hour  would  yield  an  MTBF  of  3.57 
x  1021  hours. 

The  tabular  information  is  based  upon  the 
assumption  that  the  failure  rate  of  a  single 
unit  obeys  the  exponential  failure  law  (the 
parallel  combination  does  not). 

A  refined  result  for  the  MTBF  in  the  one- 
out-of-two  case  with  repair  is  obtained: 

MTBF  = 

2XJ 


3.  CRC  Handbook  of  Tables  for  Probability 
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